Allwinner V831 NPU (Neural Processor Unit) reverse-engineered

When Sipeed introduced MAIX-II Dock AIoT vision development kit, they asked help from the community to help reverse-engineer Allwinner V831‘s NPU in order to make an open-source AI toolchain based on NCNN.

Sipeed already had decoded the NPU registers, and Jasbir offered help for the next step and received a free sample board to try it out. Good progress has been made and it’s now possible to detect objects like a boat using cifar10 object recognition sample.

V831 NPU open-source toolchain

Allwinner V831’s NPU is based on a customized implementation of NVIDIA Deep Learning Accelerator (NVDLA) open-source architecture, something that Allwinner (through Sipeed) asked us to remove from the initial announcement, and after reverse-engineering work, Jasbir determined the following key finding:

  1. The NPU clock defaults to 400 MHz, but can be set between 100 and 1200 MHz
  2. NPU is implemented with nv_small configuration (NV Small Model),  and relies on shared system memory for all data operations.
  3. int8 and int16 are supported with int8 preferred for speed and limited on-board memory (64Mb)
  4. 64 MACs  (Atomic-C * Atomic-K)
  5. Memory-mapped register programmable from userspace
  6. Physical address locations are required when referencing weights & input/output data locations, meaning kernel memory needs to be allocated and the physical addresses retrieved if accessed from userspace.
  7. NPU weights and input/output data follow a similar layout to the NVDLA private formats, so formats like nhwc or nchw must be transformed before being fed to the NPU.

Those findings allowed him to adapt the code for the cifar10 demo from Arm’s CMSIS_5 NN library, removing all Allwinner closed-source binaries in the process. You’ll find the source code on v831-npu repository on Github, and can check out Jasbir post to find out how to try it out provided you have an Allwinner V831 board on hand.

The current code supports direct convolutions,  bias addition, relu/prelu, element wise operations, and max/average pooling, and there’s more work to be done including the development of a weight and input/output data conversion utility and integrating into an existing AI framework.

The good news is the work should also benefit other platform features an NVDLA based AI accelerator including Beagle V SBC that has just started to find its way into the hands of developers in the few days.

Share this:
FacebookTwitterHacker NewsSlashdotRedditLinkedInPinterestFlipboardMeWeLineEmailShare

Support CNX Software! Donate via cryptocurrencies or become a Patron on Patreon

ROCK Pi 4C Plus

9 Replies to “Allwinner V831 NPU (Neural Processor Unit) reverse-engineered”

  1. “Memory-mapped register programmable from userspace” and “Physical address locations are required” sounds pretty horrible from a security point of view :/

  2. Allwinner wants to keep this chip very secret. So I will help them out by not learning how to use it and by not selling any in my products. The best way to ensure secrecy is not having any customers — because having customers might expose those secrets!

    BTW, we are happily progressing along using the RV1109/1126.

      1. We have boards from Firefly and two other vendors that are local to Shenzhen. I don’t think Sipeed has shipped yet.

        The RV1109/26 is using the same platform as all of the other Rockchip CPUs. So you can use RK3399PRO to develop AI vision software and then port it over to RV1109 without much hassle. Just note RV1109 much slower than RK3399PRO.

        Firefly has all of the code here:
        Just use the RV1109 manifest to get the right build.

    1. I would like to know why you claim that. Allwinner is a chinese brand like rockchip.. yeah, they offer quite bad products.. but I think this news is a gopd thing… not otherwise. Why you claim that buy allwinner is so harmful?

      1. Allwinner is keeping the V831 SDK tightly controlled and so far I have been unable to get a copy out of them. I suppose I could pay Sochip $800 to get access. I’m just not very excited about paying $800 just to evaluate a chip that I haven’t even decided if I want to use in a design yet. And the last time I paid Sochip $800 they just gave me a board and a copy of the SDK and then never responded to any of our questions.

        On the other hand I easily acquired the Rockchip RV1109 SDK. It is well documented, source code accessible on Github, dev board available from ten vendors. So far all of the SDK features we have tried work. We have been coding for several months and it is almost certain we will put an RV1109 product into production.

  3. Very nice. I hope this progresses like the FPGA reverse engineering has. If we get a toolchain that can target a bunch of cheap SoCs with NPUs things get really interesting I think.

Leave a Reply

Your email address will not be published. Required fields are marked *

Khadas VIM4 SBC
Khadas VIM4 SBC