NVIDIA created a lot of buzz when they released $99 Jetson Nano SBC featuring a 128-core Maxwell GPU, and said to deliver 472 GFLOPS of compute performance for running modern AI workloads with a power consumption of around 5 watts.
But Jetson Nano is not the only low cost platform to deliver high performance at low power for AI workloads, as for example Rockchip RK3399Pro (RK1808 NPU) found in boards such as Toybrick RK3399Pro is said to deliver 3 TOPS for INT8, 300 GOPS for INT16, and 100 GOPS for FP16 inferences.
Those operations per second numbers can be confusing and misleading, so it’s important to check out the performance of actual neural network models, and Rockchip did provide some RK3399Pro benchmarks last year for Inception V3, ResNet34 and VGG16 models comparing the results to Apple A11, Huawei Kirin 970, and NVIDIA Jetson TX2. However, ideally you’d want result from third parties, and Chengwei Zhang got hold of a Toybrick board, and explain in details how to run Inception V3 Keras model on the board in his blog.
There are basically two main steps:
- Freeze Keras model to TensorFlow graph and creates inference model with RKNN Toolkit. To be done in a powerful Linux computer instead of the target board for performance reasons.
- Load the RKNN model on an RK3399Pro dev board and make predictions.
repo init --repo-url http://github.com/aosp-mirror/tools_repo.git -u http://github.com/rockchip-toybrick/manifest.git -b master -m rk3399pro.xml
The tarball is also available on Baidu in case you run into problems with Github.
Chengwei goes into details about the two steps described above, so I’ll skip right the the final results. Toybrick RK3399Pro board achieves an average FPS of 28.94, even faster than Jetson Nano’s 27.18 FPS running a much smaller MobileNetV2 model. The Inception V3 model is way more complex than MobileNet V2, so we can expect a larger difference between the two boards for identical models.
For reference, Rockchip reported VGG16 ran at 50 fps on RK3399Pro while running at 32 fps on the Jetson TX2 with a 256-core Pascal GPU, and 86 fps vs 82 fps for Resnet50 as shown in the older chart below.
The downside is that Toybrick RK3399Pro is much more expensive than Jetson Nano, since the 3GB RAM version sells for $249, and the 6GB RAM model for $299 on VAMRS website. Hopefully, a vendor will come up with a cheap RK3399Pro board, or better with a Rockchip RK1808 board that should offer similar inference performance at a much lower cost.
Thanks to Jon for the tip.