Rockchip RK3588’s NPU open-source driver performs object detection at 30 FPS

Tomeu Vizoso has been working on an open-source driver for NPU (Neural Processing Unit) found in Rockchip RK3588 SoC in the last couple of months, and the project has nicely progressed with object detection working fine at 30 fps using the SSDLite MobileDet model and just one of the three cores from the AI accelerator.

Many recent processors include AI accelerators that work with closed-source drivers, but we had already seen reverse-engineering works on the Allwinner V831’s NPU a few years ago, and earlier this year, we noted that Tomeu Vizoso released the Etvaniv open-source driver that works on Amlogic A311D’s Vivante NPU. Tomeu has now also started working on porting his Teflon TensorFlow Lite driver to the Rockchip RK3588 NPU which is closely based on NVIDIA’s NVDLA open-source IP.

Rockchip RK3588 open source NPU driver

He started his work in March leveraging the reverse-engineering work already done by Pierre-Hugues Husson and Jasbir Matharu and was quickly able to run TensorFLow Lite’s Conv2D and DepthwiseConv2D operations. Only two weeks later, MobileNetv1 model could run on the Pine64 QuartzPro64 SBC with the same performance level as the blob (closed-source binary).

Work was much easier than on the Verisilicon Vivante NPU because lots of the reverse-engineering work was done, and NVDLA is open-source so at least some documentation was available, which was not the case for the Vivante NPU. Nevertheless, it took only four weeks (not full-time) to have the object detection shown below work on the Rockchip RK3588’s NPU at 30 FPS.

You’ll find the source code for the Teflon project on Freedesktop website, and you can also the status of the project on Tomeu’s blog. Next up, Tomeu plans to write a kernel driver for Linux mainline in the drivers/accel subsystem. There’s still much work to be done and it’s unclear how long it will take, especially since he is working on different NPUs and will split his time between each implementation unless additional contributors join the project(s).

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK Pi 4C Plus
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
15 Comments
oldest
newest
Upgrade pi-top [3]
Upgrade pi-top [3]
12 days ago

Minor correction: according to Tomeu’s blog post, that 30fps is when running on just one of the three cores!

Upgrade pi-top [3]
Upgrade pi-top [3]
12 days ago

I did too at first!

Chris
Chris
11 days ago

It’s the fist time I eard about 3 cores inside the NPU of the 3588.

Jasbir
Jasbir
12 days ago

More info on the internals of the RK3588 NPU in the my post as I discuss its use for LLMs.

Upgrade pi-top [3]
Upgrade pi-top [3]
12 days ago

Nicely done! Have you considered collaborating with Tomeu?

Jasbir
Jasbir
12 days ago

rk llm sdk already released as beta. Performance varies depending on model so need to set your exceptions.

megous
12 days ago

Creepy

Upgrade pi-top [3]
Upgrade pi-top [3]
12 days ago

How so? Great to see you here Megi! Looking forward to the next update of your xnux blog

megous
12 days ago

Not the work on the accelerator, just the object/face recognition tech in general. 🙂

Upgrade pi-top [3]
Upgrade pi-top [3]
12 days ago

Ah! Yes…

Scott Lamb
12 days ago

Exciting! Hardware wise these RK3588 boards are perfect for NVRs. It’s nice to see the software progress toward taking advantage of that on mainline.

Luc
Luc
12 days ago

It’s just incredible to mesure the amount of work, patience, talent and so on for developing such an open source project. Starting from close to NIL without any reliable doc on a quiet advanced and new topic, this is remarkable.

petrosilius
12 days ago

Excellent work!
Do you know which available cameras work with the RK3588? Last time i checked not a lot were supported by the libcamera framework, and this limits the useability quite a lot.

Khadas VIM4 SBC