The most common way to use a development board is to simply go to the manufacturer website, and download the images from there. They are the ones who made the hardware after all, and they should be the most knowledgeable about their platform. But it may not always be true as tkaiser found out when he ran some Phoronix benchmarks on Banana Pi M2+ (aka BPI M2+) board with SinoVoIP (the manufacturer), Armbian, and Raspbian images. The results speak for themselves.
… while FLAC audio encoding not so much.
So what’s going on here? First Jon the Ripper is a multi-threaded applications, while FLAC audio encoding works mostly on one thread, so SinoVoIP image does not seem to be negatively impacted by tasks using a single core. More clues comes with one comment, after SinoVoIP released a new Android 4.4 firmware on April 11:
the maximum clockspeed for bpi m+ is just 1Ghz not 1.2 as advertised and cpu cores stops after full load
So the massive performance degradation in multicore benchmarks appears to be related to the CPU throttling implementation, with the SinoVoIP image simply killing cores instead of decreasing frequency in order to manage the CPU temperature. That means multi-threaded tasks may run on a single core after a short time with SinoVoIP image, instead of the 4 cores on Allwinner H3 processor on Armbian and Raspbian images. The small performance difference is also explained by the lower maximum CPU frequency, as Allwinner H3 is rated to run at 1.2 GHz, but SinoVoIP decreased that to about 1 GHz.
tkaiser explains further:
Unfortunately SinoVoip again tries very hard to ignore any bug fixes or improvements which will lead to the BPi M2+ being the slowest H3 board ever. Their THS settings limit the CPU clockspeed to 1008 MHz (compare with the 1.2GHz they advertise with) and to killed CPU cores instead of lowering the CPU clockspeed. So chances are great that you end up with H3 running just at 1008 MHz and only one active CPU core after running heavy stuff on the board.
While Armbian already takes special precautions for the M2+ to bring back killed CPU cores and implements sane throttling (240MHz to 1200MHz) SinoVoip chose to ignore all of this.
What’s even more frustrating is that, as I understand it, all what is needed are some modifications of script.bin (aka FEX file) – Allwinner configuration files -, and specifically the cooler_table and dvfs_table sections of the file. Hopefully, SinoVoIP, and potentially other manufacturers, will read this post so that they can provide optimized images for their Allwinner boards and devices. In the meantime, you’d be better served by using Armbian images, unless you need to run Android, although simply replacing script.bin should help.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
5 Replies to “Software Matters, or How SinoVoIP Crippled Banana Pi M2+ Performance.”
Is not only Software, Hardware also matters.
Check the big differences between Orange Pi One and BPiM2+ using the same Allwinner H3 but with hardware voltage control capable of running up to 1200MHz.
The three tests were as follows: the 1st two were both Armbian but the first time with SinoVoip’s THS settings, the 2nd time with the linux-sunxi community ones. I let cpuburn-a7 start before as an example for a very heavy workload which reliably killed all CPU cores but one. Therefore with SinoVoip’s settings (Allwinner’s defaults in fact) there was one CPU core running at 1008 MHz remaining, with Armbian settings the killed CPU cores came back and cpufreq could reach 1.2Ghz when single threaded tests ran.
The 3rd test was then made with their genuine Raspbian 8.0 image, I refrained from letting cpuburn-a7 kill all CPU cores before and so the benchmark itself lead to 1 CPU core killed while running the 1st test and the 2nd when running C-Ray. And max cpufreq still only 1008 MHz for no reason other than taking blindly Allwinner’s BSP defaults and not monitoring system behaviour.
I still hope that vendors get the idea that these settings matter as well as benchmarkers realize that they’re often not testing hardware but quality of these settings instead. By tweaking cpufreq scaling and dvfs settings on an any more recent SoC you’re able to improve performance a lot. Allwinner’s BSP settings are just the best example for really bad settings.
The test results between Oranges and the Banana Pi M2+ can not be compared since LoveRPi used active cooling when he was doing the initial round of tests. When using heatsink+fan Orange Pi One and BPi M2+ perform absolutely identical as long as we’re talking about CPU performance (same SoC, same clockspeed –> same performance).
Without a fan it gets interesting since due to a missing progammable voltage generator on M2+ VDD_CPUX can not be lowered when lower clockspeeds are used and therefore throttling works very inefficient. When BPi M2+ already has to throttle down to 648 MHz (all the time being fed with 1.3V) then Orange Pi One will jump between lower and higher clockspeeds and also between 1.1V and 1.3V.
And Orange Pi PC (or the other Oranges) will be even faster since their voltage regulator can be adjusted more fine graded so running real heavy workloads the H3 boards with the most advanced voltage regulator will show the best performance. Since the most important factor is limiting heat emissions by limiting VDD_CPUX 🙂
But to be fair: That’s only about CPU performance and unless you do number crunching on the wrong devices or try to produce misleading benchmarks the whole issue isn’t that relevant. When doing the stuff such a board is intended for BPi M2+ will consume slightly more energy and might get slower running heavy stuff on the CPU cores. But normally you won’t take notice.
And please keep in mind that this is also Orange Pi One: http://openbenchmarking.org/result/1603312-GA-1603277GA73&obr_sor=y&obr_hgv=Orange+Pi+One (here OPi One is magnitudes slower since Phoronix chose Xunlong’s Raspbian image that uses exactly the same wrong Allwinner defaults SinoVoip now uses)
The result of the aforementioned wrong THS settings combined with the usual Phoronix ‘benchmarking gone wrong’ style and the very same hardware is 5 to 10 times slower compared to sane settings.
Settings matter. In case of the H3 boards also software matters since the ‘Killing CPU cores instead of throttling’ behaviour is part of Allwinner’s BSP/Android kernel and should be fixed. But since no one wants to touch these kernel sources and since H3 boards already run with working Ethernet, USB and most other stuff with kernel 4.6.0 release candidates we simply implemented an ugly hack in Armbian that has been developed for Pine64 before (where the same problem with the BSP kernel exists): a simple service that checks cooling state every 5 seconds and brings back killed cores if temperature allows.
Weird result: At the moment a BPi M2+ or an Orange Pi One running Armbian might be up to 4 times faster than when running with the vendor’s OS images. And that’s not rocket science but just ‘taking care’ a little bit and learning from the past.
Help needed: We at Armbian currently try to improve THS settings for Orange Pi One/Lite, NanoPi M1 and Banana Pi M2+. In case anyone a bit more experienced owns such a board without heatsink applied to the SoC please read through http://forum.armbian.com/index.php/topic/1231-testers-wanted-improving-ths-settings/ and consider helping us improve the settings (every OS image and everyone might benefit from later!)