py-videocore6 Raspberry Pi 4 GPGPU Python Library Leverages VideoCore 6 GPU

Orange Pi Development Boards

Raspberry Pi 4 SBC was released at the end of June with a new Broadcom BCM2711B SoC that also includes VideoCore 6 (VC6) GPU for 2D and 3D graphics, and that could also be used for general-purpose GPU computing (GPGPU).

In the past we’ve seen companies such as Idein leveraged VideoCore 4 GPGPU capabilities in Raspberry Pi 3 / Zero to accelerate image recognition, and they released a python library (py-videocore) for that purpose.

The problem is that the VideoCore 6 GPU found in RPi 4 is quite different than the VideoCore 4 GPU in earlier versions of the Raspberry Pi Foundation board as forum member phiren explains:

I’ve been looking though the open source drivers and here are some of my observations:

  • vc6 is clearly derived from vc4, but it is significantly different. vc6 is only a slight extension over vc5
  • The QPU pipeline stays mostly the same, you still have an add ALU and a multiply ALU and it can issue two ALU OPs per cycle. There is still 4 SIMD lanes, interleaved over 4 cycles.
  • The instruction encoding for the QPUs is different, but the core instructions are the same.
  • Instructions for packed 8 bit int math has been dropped, along with most of the pack modes.
  • Instructions for packed 16bit float math has been added (2 floats at in a single operation)
  • the multiply ALU can now fadd, so you can issue two fadds per instruction.
  • the add ALU has gained a bunch of new instructions, that I don’t recognise by name and I haven’t explored.

That means py-videocore library could not be used as it is on RPi 4. But if you’re interested in the subject you’ll be please to learn Idein has now released py-videocore6 Python library for GPGPU programming on the Raspberry Pi 4 together with three test code samples on Github.

py-videocore6 Raspberry Pi 4 GPGPU

There’s virtually no documentation right now, and since my knowledge of GPU internals is limited I don’t clearly understand what those do, but it looks like one allocates memory on the GPU/QPU, the other displays some GPU parameters, and finally one runs some assembly on the Videocore 6 QPU.

If you want to try it out, you can install the library as follows:


and then run the tests:


However, there may not be that much benefit to using Raspberry Pi 4 over Pi 3 as Akane  – from the same forum thread discussed above – calculated that in theory VideoCore 6 should only deliver slightly more performance in terms of GFLOPS:

VideoCore IV @ 250MHz: 250 [MHz] x 3 [slice] x 4 [qpu/slice] x 4 [processor] x 2 [op/clock] = 24 Gflop/s
VideoCore IV @ 300MHz: 300 [MHz] x 3 [slice] x 4 [qpu/slice] x 4 [processor] x 2 [op/clock] = 28.8 Gflop/s
VideoCore VI @ 500MHz: 500 [MHz] x 2 [slice] x 4 [qpu/slice] x 4 [processor] x 2 [op/clock] = 32 Gflop/s

 

Support CNX Software - Donate via PayPal or become a Patron on Patreon

Leave a Reply

avatar
  Subscribe  
Notify of