GreenWaves Technologies, a fabless semiconductor startup based in Grenoble, France, has designed GAP8 IoT application processor based on RISC-V architecture, and optimized for image and audio algorithms including convolutional neural network (CNN) inference with high energy efficiency thanks to an 8-core computational cluster combined with a convolution hardware accelerator. The design is based on RISC-V based Parallel Ultra Low Power (PULP) computing open-source platform.
The new processor targets industrial and consumer products integrating artificial intelligence, and advanced classification such as image recognition, counting people and objects, machine health monitoring, home security, speech recognition, consumer robotics, wearables and smart toys.
- 1x extended RISC-V fabric controller core with 16 kB data and 4 kB instruction cache for system control
- 8x extended RISC-V compute cores with 64 kB shared data memory and 16 kB shared instruction cache
- 1x Hardware optimized synchronization unit
- 1x Hardware Convolution Engine (HWCE)
- Multi channel 1D/2D DMA, specialized multi-channel micro DMA for autonomous peripheral support
- Programmable Voltage Regulator
- Real Time Clock
- 2x programmable clocks
- Secured execution support with Memory Protection Unit
- 512 kB State Retentive L2 Memory
- Optional external high speed low power SDRAM up to 16 MB, through HyperBus
- 32 kHz external quartz, Up to 250 MHz internal clock
- I/O interfaces
- 128 Mb/s LVDS IEEE compliant
- Serial I/Q
- Quad SPI Master + additional SPI Master, SPI Slave
- 1x I2S
- 1x I2C
- 1x Camera parallel interface
- HyperBus (External Flash and RAM)
- Up to 32 GPIOs
- 4x PWM
- Supply Voltage
- 1.2 V down to 1V core VDD supply
- 1.8 V to 3.3 V for I/Os
- aQFN 84 package
The processor is capable of delivering up to 8 GOPS at a few tens of mW, or up to 200 MOPS at 1 mW thanks to partially a cycle 5×5 convolution. The company compared the (theoretical) performance differences between GP8 to STM32H7 (Cortex M7) MCU for a CNN graph, and we can clearly see the massive advantage the new processor has for that particular task.
|STM32H7||216 MHz||99.1 ms||21 405 600||60 mW (STM32H7)|
|GAP8||15.4 Mhz||99.1 ms||1 527 232||3.7 mW|
|GAP8||175 Mhz||8.7 ms||1 527 232||70 mW|
If GAP8 is configured to run at 15.4 MHz it can complete the task as fast as STM32 F7, but using only a fraction of the power, or run the task over 10 times faster when clocked at 175 MHz with a only slightly higher active power. Another way to look at power consumption, is the company’s claim that the processor can classify a QVGA image every three minutes for 10 years on a small 3.6 Wh battery.
Some typical use cases include:
- Always-on face detection with a few mWs of power
- Indoor people counting / presence detection with years of autonomy
- Sub $15 machine vision and voice control solutions for consumer robotics
- Single-chip processing for 4 microphone voice capture and 10-word speaker-independent keyword spotting
- Memory / Storage – 256Mbits SPI flash, I2C EEPROM, HyperBus combo DRAM/Flash 512Mbits Flash + 64Mbits DRAM
- Camera connector for an external camera (e.g. Himax HM01B0)
- USB port
- USB to GAP8 JTAG + UART
- Misc – Reset button, Configurable I/O voltage
- Battery holder (SAF17500), DC connector
- Arduino Uno compatible Master/Shield
GAP8 can be programmed like any MCU thanks to GAP8 SDK including:
- The RISC-V GCC/GDB toolchain with extensions to the optimizer for the extra instructions that we have added to GAP8
- The MCU/Fabric Controller side tools include 2 OS choices (this list will be extended in the future): PULP OS, or Arm Mbed OS (for RISC-V/GAP8)
- Cluster side development tools – GAP8 AutoTiler to generate C code to automate the movement of data between L2 or external memory.
- Code generators for the cluster – GAP8 Generator Library including different algorithms developed using the GAP8 AutoTiler. It includes CNN layers, FFT, Matrix Operations, FIR Filters, and more.
You can find more details about the GAP8 processor, and/or pre-order the development kit (199 Euros) scheduled to ship in April 2018 on the GreenWaves website. The company is also attending Embedded World in Germany at the RISC-V Foundation booth (Hall 3A, Booth 3A-419).
Thanks to TLS for the tip.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
12 Replies to “GreenWaves GAP8 is a Low Power RISC-V IoT Processor Optimized for Artificial Intelligence Applications”
Does the compute core support risc-v vector extension?
Not according to this earlier article: https://www.cnx-software.com/2016/04/06/pulpino-open-source-risc-v-mcu-is-designed-for-iot-and-wearables/
BTW, has the vector extension been finalized already?
I mean I like RISC-V. We need more open hardware. But the chip is still kind of expensive. There are cheaper ARM M3 and M0 Chips out there. I hope we will see a high price drop in the future. Maybe you can get a huge discount when you buy 10.000.
Wow. Philipp completely misses the point of this 8.1 core shared memory MCU…
Comparing to an STM32H7 seems a bit apples to oranges, no? Wouldn’t their target applications be more suited for some kind of CPU/GPGPU combination? Or is this filling in the gaps that GPU’s can’t handle?
At the end of the day, this seems incredibly tricky to program, and relying on their custom libraries seems a bit scary b/c good matrix/fft libaries are incredibly tricky to write. Even a big player like ARM can’t provide decent FFT libraries for their NEON instruction set. Maybe OpenCL-on-CPU is viable? Though I haven’t tried, I hear that’s okay-ish
I think the point of this RISC-V MCU is doing inference at very low power (powered by a coin cell).
The systems with CPU/GPU will probably not even boot with that kind of power source.
Gotcha, that makes sense. Thanks for the explanation. I figured there were low power GPU-like solutions
Hi Jacky, it does not support the ‘official’ vector extension which I believe has not been standardized yet. I might be wrong on this.
The direction of the official vector extension is very much towards HPC and we are very much oriented towards low energy so we/PULP designed our own. The vector extension works on both 8 bit and 16 bit fixed point operands and includes an extremely useful single cycle vector dot product with accumulate.
Hope this helps
Hi. We compared to to the H7 since Arm was publishing benchmarks on the M7 targeting exactly the same market as us. I’m not aware of any GPU that runs at the energy levels that we do. GAP8’s fabric controller/MCU core is pretty much as easy to program as any MCU on the market.
You are correct that the 8 core cluster is a more difficult engine to program but it follows a pretty classical OpenMP type programming model. We are releasing some tools and pre-made examples to help with understanding the cluster. The SDK comes with open source code generators for a variety of algos such as CNN/NN layers, FIR filters, FFTs, Matrix operations, HoG, MFCC, etc, etc. As to whether they are good or not that will be for you to judge. You will be able to see the code.
We are also working internally and with partners on end to end examples. We will be releasing a lot of this stuff onto Github when we ship the gapduino cards.
The M7 comparison you’ve carried is formidable. Do you have other power measurements as well (not necessarily against competitors) — FFT, FIR?
Yes. On FFT we have sum stuff and we are working on keyword spotting performance as well MFCC -> DNN.
We will be publishing figures on these as blogs on our site over the next few weeks.
some not sum 🙁