py-videocore6 Raspberry Pi 4 GPGPU Python Library Leverages VideoCore 6 GPU

Raspberry Pi 4 SBC was released at the end of June with a new Broadcom BCM2711B SoC that also includes VideoCore 6 (VC6) GPU for 2D and 3D graphics, and that could also be used for general-purpose GPU computing (GPGPU).

In the past we’ve seen companies such as Idein leveraged VideoCore 4 GPGPU capabilities in Raspberry Pi 3 / Zero to accelerate image recognition, and they released a python library (py-videocore) for that purpose.

The problem is that the VideoCore 6 GPU found in RPi 4 is quite different than the VideoCore 4 GPU in earlier versions of the Raspberry Pi Foundation board as forum member phiren explains:

I’ve been looking though the open source drivers and here are some of my observations:

  • vc6 is clearly derived from vc4, but it is significantly different. vc6 is only a slight extension over vc5
  • The QPU pipeline stays mostly the same, you still have an add ALU and a multiply ALU and it can issue two ALU OPs per cycle. There is still 4 SIMD lanes, interleaved over 4 cycles.
  • The instruction encoding for the QPUs is different, but the core instructions are the same.
  • Instructions for packed 8 bit int math has been dropped, along with most of the pack modes.
  • Instructions for packed 16bit float math has been added (2 floats at in a single operation)
  • the multiply ALU can now fadd, so you can issue two fadds per instruction.
  • the add ALU has gained a bunch of new instructions, that I don’t recognise by name and I haven’t explored.

That means py-videocore library could not be used as it is on RPi 4. But if you’re interested in the subject you’ll be please to learn Idein has now released py-videocore6 Python library for GPGPU programming on the Raspberry Pi 4 together with three test code samples on Github.

py-videocore6 Raspberry Pi 4 GPGPU

There’s virtually no documentation right now, and since my knowledge of GPU internals is limited I don’t clearly understand what those do, but it looks like one allocates memory on the GPU/QPU, the other displays some GPU parameters, and finally one runs some assembly on the Videocore 6 QPU.

If you want to try it out, you can install the library as follows:

and then run the tests:

However, there may not be that much benefit to using Raspberry Pi 4 over Pi 3 as Akane  – from the same forum thread discussed above – calculated that in theory VideoCore 6 should only deliver slightly more performance in terms of GFLOPS:

VideoCore IV @ 250MHz: 250 [MHz] x 3 [slice] x 4 [qpu/slice] x 4 [processor] x 2 [op/clock] = 24 Gflop/s
VideoCore IV @ 300MHz: 300 [MHz] x 3 [slice] x 4 [qpu/slice] x 4 [processor] x 2 [op/clock] = 28.8 Gflop/s
VideoCore VI @ 500MHz: 500 [MHz] x 2 [slice] x 4 [qpu/slice] x 4 [processor] x 2 [op/clock] = 32 Gflop/s


Share this:

Support CNX Software! Donate via cryptocurrencies or become a Patron on Patreon

ROCK Pi 4C Plus
Notify of
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
Khadas VIM4 SBC