Run Raspberry Pi 4 Cooler with a New Firmware & One Easy Trick

Raspberry Pi 4 launched last June with a lot of buzz as it offered much better performance, more memory, and faster I/Os compared to Raspberry Pi 3 model B+.

Benchmarks confirmed the improved performance but also revealed a heatsink was necessary to ensure optimal performance under heavy loads.  Some companies also launched an oversized heatsink+fan combo for the board, but it’s really over the top and absolutely not necessary unless possibly in higher room temperature (50°C?).

The Raspberry Pi Foundation also worked on improving the video to lower CPU temperature and power consumption, and a few days later released a beta version of VLI firmware that dropped the temperature by 3 to 5°C. Good effort but sadly the updated VLI firmware (used for the PCIe to USB controller) also had the side-effect of much slower USB performance. A new VLI firmware was released in September offering both a lower temperature and good performance.

The Foundation has now published a blog post detailing all the work they have done to improve the thermal performance of Raspberry Pi 4 during the four months since launch. They wrote fairly long post, so I’ll provide the main takeaways.

They stress the board with stress-ng (CPU) and glxgears (GPU):

It should be noted that glxgears utility relies on OpenGL, and most Arm GPUs use OpenGL ES, so it’s quite possible several of the commands were emulated on the CPU. es2gears might have been a better choice since it uses OpenGL ES APIs only. I’m not sure how much that matters on the VideoCore V GPU in the Broadcom BCM2711 processor.

Five firmware versions have been launched since the launch of RPi 4:

  1. Launch firmware
  2. VLI firmware mentioned above
  3. VLI + SDRAM firmware with optimizations on how the LPDDR4 is managed without impact on performance
  4. VLI, SDRAM, Clocking, and Load-Step Firmware which changes how the processor increases and decreases its clock-speed in response to demand and temperature
  5. Current beta firmware – In testing and due for public release soon. It brings several improvements, including finer-grained control over SoC operating voltages and optimized clocking for the HDMI video state machines.

If you want to make sure your Raspberry Pi runs the latest firmware, run the following commands:

And now the results with the Raspberry Pi 3 B+ as reference, and Raspberry Pi 4 with different versions of the firmware. As I understand it the boards were not fitted with a heatsink.

Power Consumption in Watts – Click to Enlarge

There’s been good progress with idle power consumption dropping from 2.89 Watts at launch to 2.10 Watts with the beta firmware, and from 7.28 Watts to 6.41 Watts under load.

Time to Thermal Throttling under CPU & GPU load in Seconds – Click to Enlarge

The second chart shows how long it takes for the system to throttle under CPU and GPU load. Raspberry Pi 3B+ started to throttle after just 19 seconds, while Raspberry Pi 4 would do so after around 60 seconds at launch, now it takes close to 180 seconds before throttling. If the board is placed vertically however cooling improves a lot with the idle temperature dropping by 2°C, and it takes over 400 seconds before throttling.

Testing was completed with a real-life test, as opposed to the synthetic tests above, that consisted of building the Linux kernel on both Raspberry Pi 3 B+ and Raspberry Pi 4 with the latest beta firmware:

  • RPi 3B+ – Total time:  5097 seconds (1h24m57s); quickly throttles to 1.2 GHz
  • RPi 4B – Total time: 2660 seconds (44m20s); no throttling, runs at 1.5 GHz during compilation

It should be noted that there was no mention of room temperature as it may affect throttling quite a bit, and if your board is inside an enclosure results will vary depending on the case design, and again room temperature.

Share this:

Support CNX Software! Donate via PayPal or cryptocurrencies, become a Patron on Patreon, or buy review samples

26 Replies to “Run Raspberry Pi 4 Cooler with a New Firmware & One Easy Trick”

    1. There is no ‘essence’ available since Gareth Halfacree managed to mention the word ‘performance’ almost a dozen times but didn’t had a look a single time whether each of the various ThreadX updates had any impact on performance (ThreadX is the primary operating system controlling the hardware on any RPi). He used load generators in the most primitive way and looked only at clockspeeds. Impressive…

    2. Well those powerful SoCs typically require a heatsink to operate. When was the last time your x86 operated without a heatsink? Maybe 486 or Pentium era before MMX?

      1. True, but which embedded x86 board is sold without heat sink /fan ? The RPI foundation should swallow their pride (and price) and add a good heat sink standard: nobody is unfailable.

        1. >True, but which embedded x86 board is sold without heat sink /fan ?

          Galileo doesn’t have a heat sink or fan.

      2. Usually if you need a heatsink or fan the manufacturer lets you know this fact and most importantly also provides a mounting concept.

      3. My UK101 (Ohio Superboard2 clone) didn’t have a heatsink on the CPU, nor did my Commodore Amiga 500. I recall how a key selling point of personal computers was they were quiet and didn’t have the huge cooling requirements of mainframes.

  1. Ok let’s test. Baseline is latest ‘Raspbian Lite’ image. I’m running sbc-bench 0.6.9 with ondemand governor (defaults):

    ThreadX version cd3add54955f8fa065b414d8fc07c525e7ddffc8 (Sep 24 2019)

    An USB3 connected Samsung EVO840 in an JMS567 enclosure, one time with ondemand cpufreq governor, one time with performance (checked using iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 ; iozone -e -I -a -s 1000M -r 16384k -i 0 -i 1):

    Translated that’s 4740 write and 6385 read 4k random IOPS and 318/360 MB/s storage performance.

    1. Now let’s check the influence of the VLI USB host adapter firmware. After unpacking and applying the firmware update, followed by a cold boot repeating the iozone tests:

      That’s now only 3278 write and 3225 read 4k random IOPS (random 4k read performance almost halved) and also sequential transfer speeds are lower than before.

      1. So another load of fud… It almost sounded too good to be true, from just skimming over.

        Now just jerry missing and some super logical jamesh bs…

      2. Now testing with latest ‘firmware’ update applied (using rpi-update, an apt full-upgrade didn’t do the trick). ThreadX version is aabb1fb5c19d80db268aeccd67b9f4e0f3d48a2a from Nov 29 2019:

        Memory bandwidth and latency remain unchanged, the 7-zip scores look lower but the results have been influenced by swapping as can be seen in sbc-bench’s detailed output close to the end:

        USB3 performance also remains unchanged:

        Measured temperatures are significantly lower now, with latest ThreadX and kernel update the RPi Trading guys now also copied the concept of somewhat ‘fine grained cpufreq control’ (not only 600 MHz + ‘Turbo’ mode any longer but they really managed to define 4 different clockspeeds. Amazing!).

        So the only update harming performance is the one for the USB controller negatively affecting use cases like ‘rootfs on USB connected SSD’ or database workloads. For these use cases reverting back to firmware vl805_fw_013701 might be an idea (then disabling PCIe power management which ends up with higher temperatures but almost twice as high IOPS/bandwidth numbers with 16k chunks for example)

        1. Interesting. Thanks for explaining.
          Device available for 35-55$, additional heatsink/cooling fan/heatpipe solution 2.7$-21.99$/~£6.72/21.99-29.99$.

          Even cheap devices won’t be limited through IOPS or bandwidth in a (few) couple of years on user profile browsing or multimedia tasks. Avoiding hot pcb keeps being a challenge, maybe.
          What is our responsibility for power savings (on server types and consumer side load balancing: AI optimization)? That might be a bigger question.

  2. Most important for me is that the latest beta firmware adds HDMI power off for those of us using a pi4 as a desktop.

  3. Preliminary conclusion from my side (as someone who is only interested in ‘light server’ use cases): Since I need storage performance I reverted the USB controller’s firmware back to the initial version vl805_fw_013701. I use the latest beta ThreadX release (called firmware in RPi land) and opted for a slight ‘overclock’ to 1750 MHz:

    Then tested USB3 storage performance again which looks pretty good (especially random IOPS are good again — and this even with ondemand cpufreq governor which helps reducing idle temperatures and consumption):

    No heatsink, just the board standing upright at an ambient temperature of ~23°C:

    Since I’ve only a 1GB board again swapping happened with the 7-zip benchmark tests which would show otherwise higher scores (close to RK3399 performance level). No throttling happened with this type of workload even at 1750 MHz but as soon as you use crappy enclosures like especially the official one from RPi Trading Inc. you can completely forget about this level of performance since then the board will heat up +30°C easily and throttling will happen for sure.

    The remaining downsides: missing ARMv8 Crypto Extensions and of course the closed source nature of every RPi. The primary operating system called ThreadX is in control of the hardware and a Linux running on this board is only a 2nd or 3rd class citizen.

    BTW: If you enabled 1750 MHz in the ThreadX control file called config.txt then funny things happen. Linux now operates with 5 different cpufreq OPPs that are all fake as usual:

    So when the Linux kernel thinks it would run at 1800 MHz the same happens as if it thinks it would run at 1200 MHz: most probably in reality clocking with 1750 MHz. But you never know since ThreadX does its own thing anyway. Watching the output of sbc-bench -m (monitoring mode) on such an ‘overclocked’ but idle RPi 4 is really fun since it makes not that much sense what the kernel reports and what’s happening behind the scenes.

    And based on my experiences tuning cpufreq/DVFS OPP for Allwinner SoCs it also makes not that much sense to define a lot of lower DVFS OPP since more fine grained control of the upper frequencies is key to more performance in high load scenarios.

        1. There’s a distinct difference with having a Source License and not paying Microsoft. The presumption is that there is no code concerns, which is a bit different than mine and many others’- which is I’m paying Microsoft money per each Pi, period. Even with a source license, there’s typically a per-unit cost that you still pay the vendor on the royalties side of things. With GPL, it’s merely compliance as they remuneration. In this case, I’d rather not be beholden to them…or to someone paying money to maintain the fork, which is what this is.

      1. >so when do they release it under an open source license?

        If they intended to ever release the code why would they have used an RTOS that they can only ship as a compiled binary?

        1. Because they never intended to Open Source anything at the times these cores were designed. They were intended for set-top and varying Android original task sets for business customers. They had a bit of a change of heart over time since that time. Having said this, I’d think they’d re-think the RTOS and re-work the VC-IV/VI firmware to be something comparable to the current and use Zephyr, FreeRTOS, etc. there instead. Definitely lower licensing costs (It costs Broadcom per unit for this…)

      2. Releasing the source might actually endanger millions of existing installations if static analysis tools or security experts found bugs in the code. It’s always the case when commercial code has been leaked. RPi foundation does not want to increase the risk of existing user base with shortsighted opportunism. Future RPi versions might run a completely free firmware, but it needs to be started from scratch like the VC4 GL driver.

  4. “It should be noted that glxgears utility relies on OpenGL, and most Arm GPUs use OpenGL ES, so it’s quite possible several of the commands were emulated on the CPU. ”
    The GPU is capable of OpenGL 2.1 and OpenGLES 3.2.
    We’re talking about glxgears here, it’s not exactly Crysis. Even if there were emulated OpenGL calls it would have been unlikely to hit them.

Leave a Reply

Your email address will not be published.