Home > Debian, Linux, Linux 4.x, Rockchip RK33xx, Testing > Mecool VS-RK3399 Board Linux Benchmarks

Mecool VS-RK3399 Board Linux Benchmarks

September 27th, 2017 Leave a comment Go to comments

I’ve just showed how to install Debian, and build a Linux image from source on VS-RD-RK3399 board (aka Mecool VS-RK3399) last week-end, but at the time I did not run any benchmarks on the board. We already have plenty of benchmarks for Rockchip RK3399 in Android, so instead I started by installing the latest Phoronix Test Suite in Debian:


… and ran the tests I did on NanoPi NEO 2 earlier:

For whatever reasons OpenSSL and Mafft failed to download, but we still have the other benchmarks to compare with. Note that the Debian image is likely not optimized, and while the system runs an Aarch64 kernel, the rootfs is only 32-bit, which may have affected some of the benchmarks.

But let’s see what’s we’ve got, starting with John the Ripper password cracker, a multi-threaded benchmark.

Click to Enlarge

We’d normally expect hardware platforms based on Rockchip RK3399 SoC to outperform all other Cortex A53 or A17 based boards in the list, but MiQi board with a quad core Cortex A17 processor @ 1.8 GHz, and BPI-M3 board with an octa-core Cortex A7 processor @ 2.0 GHz, both beat the VS-RK3399 with an hexa-core processor with two Cortex A72 cores @ 1.8 GHz, and four Cortex A53 cores @ 1.4 GHz. BPI-M3 is even twice as fast in this test.

Click to Enlarge

C-Ray is also a multi-threaded benchmark, but here Rockchip RK3399 SoC shines, making VS-RK3399 the fastest platform of the lot, also beating MeLE PCG02U TV stick (MeUbuntu 14.04.3) powered by an Intel Bay Trail Z3735F processor.

Click to Enlarge

Smallpt is another multi-threaded benchmark, and VS-RK3399 board does well, but it’s still beaten by the Intel TV stick (OpenMP might help here?), and Banana Pi M3.

Click to Enlarge

The Rockchip RK3399 board is the fastest ARM platform for Himeno linear solver of pressure Poisson, but due to specific x86 instructions and/or optimization, the Bay Trail TV stick is well ahead.

Click to Enlarge

Finally, for FLAC audio encoding, VS-RK3399 is the best ARM platform (in the tested lot) by a wide margin, but Intel is ahead with their more advanced SIMD instructions.

So Rockchip RK3399 processor will outperform all ARM boards with single threaded tasks thanks to it Cortex A72 cores, but in some multi-threaded tests, octa-core Cortex A7, and quad core Cortex A17 platforms may deliver better results.

VS-RD-RK3399 board comes with a 32GB Samsung eMMC 5.0 flash that supposed to deliver 246/46 MB/s R/W speed, and 6K/5K R/W IOPS.

I tested it with iozone using a 100MB file:


Results for the read speed are around the theoretical limit, but write speeds are well above, maybe because of some caching.

I switched to Gigabit Ethernet performance testing starting with a full duplex iperf test:


Not quite optimal, so let’s look at upload only:


and download only:


Both of which are quite good. I had been told that IRQ may all be handled by CPU0 (Cortex A53 core in the board), and the following changes may improve performance:


So I repeated the tests, and something impossible happened:


We’re not supposed to get 1.35 Gbps on Gigabit Ethernet… So I tried again for a longer period of time (10 minutes):


Same results.. But looking at the output from the server side, it looks more realistic:


and it does improve a little compared to the first test without the tweaks.

  1. tkaiser
    September 27th, 2017 at 20:08 | #1

    Would be interesting to repeat the network tests with iperf3 instead while having a look with htop whether a CPU bottleneck occurs (very likely with bidirectional test) and then see what’s happening when sending IRQ processing to cpu5 instead of cpu3.

    BTW: some of the multithreaded Phoronix benchmarks show standard deviation too high so most probably with better heat dissipation numbers would look better for your RK3399 device. Also BPi M3 features Cortex-A7 cores and not A53 and the reason why slow Cortex-A7 can shine here is since PTS (as so often) uses bizarre compilation settings.

  2. September 27th, 2017 at 20:23 | #2

    @tkaiser
    Yes, cooling is somewhat of an issue on multi-threaded benchmarks, but nothing massive:

  3. GanjaBear
    September 27th, 2017 at 20:58 | #3

    C-ray is a multithreaded benchmark, stressing all available cores. Please correct the error.

  4. September 27th, 2017 at 21:01 | #4

    @GanjaBear
    Thanks. Fixed.

  5. blu
    September 27th, 2017 at 21:03 | #5

    Typo: C-Ray is a single threaded -> multi-threaded

  6. GanjaBear
    September 27th, 2017 at 21:14 | #6

    Thx @cnxsoft. If you want to see the full potential of the SoC in that benchmark, rerun the test at ‘CFLAGS=-Ofast’

  7. tkaiser
    September 27th, 2017 at 21:29 | #7

    @GanjaBear
    You’re absolutely right but would have to patch the makefiles for some of these Phoronix ‘benchmarks’. I had numbers differing by 14 times when comparing reasonable compiler flags with the stuff the PTS uses for whatever reasons. Everything known sinces ages BTW: https://www.phoronix.com/forums/forum/software/mobile-linux/28245-arm-cortex-a9-pandaboard-es-benchmarks?p=320735#post320735

  8. September 27th, 2017 at 21:32 | #8

    @GanjaBear
    I’ve run

    Results:

    vs 98 seconds with O3. That’s quite a difference (Thermal throtlling is more obvious here too). Do you know what happened at the compiler level?

    Some fun comparison against other platforms:

  9. blu
    September 27th, 2017 at 22:48 | #9

    Re ‘-Ofast’ – keep in mind that enables -ffast-math, so some fp computations may go highwire. Just saying (should be safe in both c-ray and smallpt).

    But it’s really curious to see how great 8x A7 @ 2GHz perform on the embarrassingly-parallel c-ray and smallpt. Who’d have thought?.. ; )

  10. Ajira Tech
    October 12th, 2017 at 12:40 | #10

    The specs and the performance benchmark numbers are very useful. But I feel it is missing the following two items that most users would want to know.

    Has anyone tested read/write throughput to a USB 3.0 Hard disk ?

    Also, any ideas if its possible to install Kodi with GPU acceleration along with Linux ?

    Thanks,

  11. tkaiser
    October 12th, 2017 at 13:33 | #11

    @Ajira Tech
    USB3 performance should be the same as with RK3328 since sharing same USB3 host controller with RK3399. A web search for ‘rock64 iozone close to 400 mb/s’ should show numbers (and ‘close to 400 mb/s’ should already give an idea what to expect with fast SSDs –> translates to ‘fast enough for any HDD since the HDD is always the bottleneck’ 😉 )

    But of course you need an UASP (USB Attached SCSI) capable disk enclosure to get full speed.

  1. No trackbacks yet.