Smart vision sensor systems enable many computer vision applications such as autonomous drones and wearable devices. These battery-powered gadgets have very stringent power consumption requirements. Close-to-sensor feature extraction compressing the full image into descriptive keypoints, is crucial as it allows for several design optimizations. First, the amount of necessary on-chip memory can be lessened. Second, the volume of data that needs to be exchanged between nodes in Internet of Things (IoT) applications can also be reduced. This work explores the usage of an Application Specific Instruction Set Processor (ASIP) tailored to perform energy-efficient feature extraction in real-time. The ASIP features a Very Long Instruction Word (VLIW) central core comprising one RV32I RISCV and three vector slots. The on-chip memory sub-system implements parallel multi-bank memories with near-memory data shuffling to enable single-cycle multi-pattern vector access. As a case study, Oriented FAST and Rotated BRIEF (ORB) is used to evaluate the proposed architecture. We show that the architecture supports VGA-resolution images at 140 Frames-Per-Second (FPS), for one scale, reducing the number of memory accesses by 2 orders of magnitude comparing to other embedded general-purpose architectures.