TensorFlow Lite for Microcontrollers Benchmarked on Linux SBCs

Dimitris Tassopoulos (Dimtass) decided to learn more about machine learning for embedded systems now that the technology is more mature, and wrote a series of five posts documenting his experience with low-end hardware such as STM32 Bluepill board, Arduino UNO, or ESP8266-12E module starting with simple NN examples, before moving to TensorFlow Lite for microcontrollers.

Dimitris recently followed up his latest “stupid project” (that’s the name of his blog, not being demeaning here :)) by running and benchmarking TensorFlow Lite for microcontrollers on various Linux SBC.

But why? you might ask. Dimitris tried to build tflite C++ API designed for Linux, but found it was hard to build, and no pre-built binary are available except for x86_64. He had no such issues with tflite-micro API, even though it’s really meant for baremetal MCU platforms.

Let’s get straight to the results which also include a Ryzen platform, probably a laptop, for reference:

SBCAverage for 1000 runs  (ms)
Ryzen 2700X (this is not SBC)2.19
AML-S905X-CC15.54
Raspberry Pi 3 B+13.47
Jetson nano9.34
NanoPi Duo36.76
NanoPi Neo16
NanoPi NEO222.83
NanoPi NEO45.82
NanoPi K1 Plus14.32
Orange Pi Prime18.40
Beaglebone Black97.03
STM32F746 @ 216MHz76.75
STM32F746 @ 288 MHz57.95

And in chart form.

Click to Enlarge

The Ryzen 2700X processor is the fastest, but Rockchip RK3399 CPU found in NanoPi NEO4 is only 2.6 times slower, and outperforms all other Arm SBCs, including Jetson Nano. Not bad for a $50 board. Allwinner H3 based NanoPi Neo board also deserves a mention as at $10, it offers the best performance/price ratio for those test.

If you want to try it on your own board or computer, you can do so as follows:


Note that’s for Aarch64 (Arm 64-bit targets), the last command line will be different for other architectures, for example on Cortex-A7 based SoC, the program will be named “mnist-tflite-micro-armv7l” instead.

Note that while tflite-micro is easy to port to any SBCs, there are some drawbacks over using tflite C++ API. Notably tflite-micro does not support multi-threading, and it’s much slower than tflite C++ API.

CPUtflite-micro/tflite speed ratio
Ryzen 2700X10.63x
Jetson nano (5W)9.46x
Jetson nano (MAXN)3.86x

The model is also embedded in the executable instead of being loading from a file, unless you implement your own parse. You’ll find a more detailed analysis and explanation on Dimtass’ blog post.

Support CNX Software - Donate via PayPal or cryptocurrencies, become a Patron on Patreon, or buy review samples
Subscribe
Notify of
guest
34 Comments
oldest
newest most voted
Advertisements