The most powerful 96boards development board – HiKey 960 – has finally been launched, and can be purchased for $239 on Aliexpress, Amazon US, Switch Sense (Japan), Seeed Studio, or All Net (Germany).
HiKey 960 specifications have not changed much since we found out about the board:
- SoC – Kirin 960 octa-core big.LITTLE processor with 4x ARM Cortex A73 cores @ up to 2.4 GHz, 4x Cortex A53 cores @ up to 1.8 GHz, and a Mali-G71 MP8 GPU
- System Memory – 3GB LPDDR4 SDRAM (PoP)
- Storage – 32GB UFS 2.1 flash storage + micro SD card slot
- Video Output / Display Interface – 1 x HDMI 1.2a up to 1080p, 1x 4-lane MIPI DSI connector
- Connectivity – Dual band 802.11 b/g/n/ac WiFi and Bluetooth 4.1 with on-board antennas
- USB – 2x USB 3.0 type A host ports, 1x USB 2.0 type C OTG port
- Camera – 1x 4-lane MIPI CSI, 1x 2-lane MIPI CSI
- PCIe Gen2 on M.2 Key connector
- 40 pin low speed expansion connector with +1.8V, +5V, DC power, GND, 2x UART, 2x I2C, SPI, I2S, 12x GPIO
- 60 pin high speed expansion connector: 4L MIPI DSI, 2L+4L MIPI CSI, 2x I2C, SPI (48M), USB 2.0
- Misc – LEDs for WiFi & Bluetooth, 4x user LEDs, power button, copper heatsink for CPU
- Power Supply – 8V-18V/2A via 4.75/1.7mm power barrel (EIAJ-3 Compliant); 12V/2A power supply recommended; PMU: Hi6421GWCV530, Hi6422GWCV211, Hi6422GWCV212;
- Dimensions – 85mm x 55mm
The video will eventually be uploaded to YouTube, but in the meantime I’ve embedded the Facebook video.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
37 Replies to “96Boards Compliant HiKey 960 ARM Cortex A73 Development Board is Now Available for $239”
all that processing power with 10 year old hdmi…cough….why!!!!
Any information about other linux base distros?
The Kirin is a phone/tablet chip so MIPI-D/CSI is the hi-res interface (will go up to 4k). Using HDMI to drive a phone/tablet screen would just add unnecessary complexity and power draw. LeMaker could have added a bridge but that would have driven up the cost on an already pricey board.Hackaday.io has a few MIPI DSI to HDMI shield projects if you’re interested.
f*ck gen2 x4 🙁 no gen3
On-board flash storage is monstrous. RAM, eh, not so much. Great SoC, overall.
if mali g71 then mainline is a joke.
With the RK3399 this could be one of the most interesting boards. The price is also great.
What’s the problem with G71 and mainline?
Can we get a 1GbE NIC adapter using the M.2 mPCIe connector?
USB3 will carry 1Gb Ethernet all day…
Sure, i had in my mind that USB3 ports were for disks and NAS 🙂
Will this get any support other than android ? Ie linux mainline. I won’t hold my breath…
I was about to comment on how I thought the price was crazy high, then saw your response. What are you comparing with? It seems outrageous to me, but then I am desperately looking for a board with storage that can support the random IO of MySQL but need it to be less than US$100 because I plan to buy in high volume.
To everyone: Also, why m2? m2 storage is significantly more expensive than mSATA storage, and I wonder if this board can even saturate mSATA let alone m2? Note, I am still learning all these details about SBCs so if my comments are based on false understandings I would greatly appreciate learning how my current grasping understanding is wrong.
new gpu archinectures always is closed source blob.few years later may be we can see open source gpu in mainline.
ı think you can only use ssd or wifi module.
Nice specs, minus the 1080p limitation. Dang! If only there was a case for this.
You’re talking about ‘random IO of MySQL’, then ‘saturate mSATA’ (which is sequential and not random IO) and then consider buying an Android phone without enclosure and Linux support for the task?
Good luck to get even correct specifications (LeMaker is amongst those vendors that really don’t care which words and numbers they randomly assemble to fill a web page with ‘specifications’ in the title) and if you want to build a product on this weird phone in 96boards format you should be aware that everything you stick into this M.2 thingie will project over the board, that the SoC isn’t HDMI capable (so you pay for a DSI to HDMI converter you don’t need) and that the SoC only has one USB ports (so either the Type-A receptacles work or Type-C)
I would choose an ESPRESSOBin for the task instead and not a modified Android phone.
Isn’t that a different proposition, though? I mean, the statement ‘This GPU is supported via a binary blow, and thus will likely not run with the latest mainline’, and ‘This GPU is supported via a binary blow, and thus will not run with mainline per se’ are not equivalent to me.
Any idea where I can order 1pc of board in Europe ? I don’t want to pay any import tax and All Net is for companies only
The board has only an x1 PCIe Port. Take a look into the schemes: https://github.com/96boards/documentation/blob/master/ConsumerEdition/HiKey960/HardwareDocs/HiKey960_Schematics.pdf
It’s a x1 PCIe Interface. So, it’s as slow as SATA. I wish more a PCIe x4 Interface like the RK3399
I did benchmark on Hikey960 board and I don’t understand why it is so slow
Anybody have good experience with Cortext A72, A73 and get papers speed ??
As it looks like Cortex A15 has same speed as A73
Settings matter. I would first run github.com/ssvb/tinymembench (click on the ‘Wiki’ tab there or simply do a web search to compare results — tinymembench will tell you also about L1/L2 caches).
Then I would also try out sysbench even if this is the most stupid CPU test ever (it is a compiler test more or less but the one good thing is that it does not depend on memory bandwidth). You’ll need sysbench version 0.4.12 (0.5 shows lower numbers) and GCC 5.4 (GCC 4.x shows lower numbers, no idea what higher GCC versions do). If you’re on Ubuntu Xenial the arm64 distro package will do. The benchmark will run that short that throttling shouldn’t be an issue.
(on a quad-core Cortex-A53 running at 1.5GHz this will take 6 seconds when built for ARMv8, that’s also the reason you can’t compare with ARMv7 SoCs directly since there execution takes 16 times longer)
Reading the thread today, it turned it was a problem with L2 cache not being enabled in the DTS file.
Well, there’s L2 cache being mentionend (that’s why I mentioned tinymembench yesterday, some contradicting statements (DT/kernel vs. ATF) but no numbers. I wouldn’t call this a ‘it turned out’ situation 😉
Now that’s some interesting numbers over in the thread 🙂
You got a sysbench execution time of 4.5521s on the big cores and 4.9342s on the littles which makes at least sense for the little cores (an ODROID-C2 running 4 A53 at 1.5 GHz scores 6 seconds and 5 vs 6 seconds is exactly what to expect when comparing 1.5GHz vs. 1.8GHz). The A73 only finishing in 4.5521s and not below 2 seconds illustrate the or let’s say one problem. At least sysbench is not affected by memory bandwidth so IMO it makes some sense to look for (L2) cache problems and SMP/HMP or scheduling more generally speaking.
Not only starting with ARMv8 SoCs these things or their vendors started to cheat on us on an impressive scale. We see CPU cores that throttle down to 700Mhz while reporting they would still run with 1200MHz (RPi 3), we saw SoCs reporting they would run at 2.0GHz while capped to 1.5GHz due to thermal budget contraints (Amlogic 9xx) and maybe here we see another variation of the ‘thermal budget’ problem only allowing one or two of the A73 scores to run at full/higher speed while the others then are throttled down to pretty low values while at the same time cpufreq reported via sysfs reports happily 2400 MHz.
For Android devices this strategy would at least make some sense (you need only high single thread CPU performance but the whole cluster performance is not important since everything important runs on VPU/GPU anyway and the count of CPU cores is only necessary to sell flagship phones/tablets since clueless Android customers love high core counts).
If I would be interested in such Android toys like the Hikey960 at least that would the first thing I’ll try to look into: thermal budget settings. Which BLOB contains the strategy to operate within the tight thermal budget (these days the cpufreq/kernel settings are theoretical BS anyway, the strategy which clockspeeds are allowed in which situation live somewhere else, eg in ATF or some kernel code that deals with ‘budget cooling’ trying to dynamically downclock CPU, GPU, VPU and DRAM to prevent this thing from catching fire)
BTW: I fail to understand leo-yan’s numbers completely since way too low (15.0594s and 9.0935s). Would be interesting to see how his L2 cache settings look like (and of course the obvious: simply letting tinymembench run to get a clue what’s going on)
To further elaborate on my babbling above. If I would be interested to get a clue how this thing operates in as less time as possible the next test would be four other sysbench runs. Since while sysbench is a horribly misleading benchmark when used appropriately it can be somewhat useful (since not memory bandwidth dependent and scaling linearly to core count).
First run on cpu1 (little) is there to confirm the ‘linearly scaling’ claim. You should see with this test a result of a little bit less than 20 seconds.
2nd run on cpu4 (big) might show an execution time as less as 8 seconds or maybe even below. If that’s the case then at least this single CPU core was allowed to run constantly at the upper clockspeed. When execution times while running on 2 and 3 big cores do not scale linearly I would say you already nailed the ‘problem’ down. If results look strange retest with just –cpu-max-prime=10000 to prevent throttling effects and pause between tests a little while monitoring SoC temperatures.
BTW: Due to the package format with DRAM on top of the SoC it would be surprising if DRAM clockspeed would not also be part of an overall ‘budget cooling’ strategy (massively downclocking DRAM if the thing starts to overheat which will of course have an impact of benchmarks that depend on memory bandwidth unlike sysbench above).
And a last one: I was asking for leo-yan providing tinymembench numbers to look for his memory performance in general and whether there’s also a high increase in latency when jumping from 2097152 (2MB) to 4194304 (4MB) –> tinymembench used to detect presence and sizes of caches.
And another obvious test is to remove heatsink/fan, run a ‘stress -c 8’ and then tinymembench in parallel again.
Maybe it’s worth to link to these comments here in the thread over there if you get taskset -c4 sysbench –test=cpu –cpu-max-prime=20000 run –num-threads=1 execution time significantly lower than 10 seconds.
I agree, L2 cache active or not might be part of the problem. But I believe numbers already indicate that there’s at least one other ‘problem’ present.
@m][sko: One final remark: both tools I recommended help you interpreting the validity of results (covariance, standard deviation) without having to stupidly repeating the same set of test ‘just in case’. With tinymembench you get this shown in brackets when exceeding a specific percentage (so when there’s 3% shown you already know that there’s something going on invalidating your results –> throw numbers away, do some investigation, test again). With sysbench you need to compare min, avg, max and ‘approx. 95 percentile’. If ‘approx. 95 percentile’ and ‘avg’ vary too much you also know you’ve to throw results away. At least if it’s about understanding what’s going on and not just producing numbers without meaning.
I relied on the the scientifically proven “I updated dts file and it really go up” comment for this. But finally, it appears not to be the case with the new comments.
Somebody is going to review Mediatek X20 (Cortex A72) development board on CNX, so maybe checking the same benchmarks on it would be useful, provided they are available on Android too.
Tinymembench should be useable on Android, at least @ssvb explained in the readme how to crosscompile for Android. No idea about sysbench.
But both tools provide only insights when used correctly (and the problem with sysbench is really that it’s a compiler benchmark so only if using same version built with same compiler and settings you get comparable results).
I think the problem tablet/phone A72/A73 SoCs face is that they must feature specifications that look good (customers buying numbers), they must show high irrelevant performance numbers (detecting being AnTuTu benchmarked just like ‘Clean Diesels’ detecting being benchmarked by the EPA 😉 ), must show good real world performance (GPU! VPU! High single threaded CPU performance), nice battery live (big.LITTLE) and then all the heat has to dissipate somehow. Without certain strategies how to deal especially with the latter (‘thermal budget’) it’s impossible to meet all of the above criteria (‘running heavy loads on all CPU cores simultaneously’ definitely not part of the criteria)
And then weirdness happens, those Android toys are transformed into ‘dev boards’ and people want to run data center workloads on tablet/phone SoCs that were never designed for something like that.
I posted question on arm community forum
As ARM celebrate A73 has same speed as Intel Core
but I didn’t saw that results 🙁
I will paste all results to 96board forum
Well, ARM celebrated an A73 to be made with 10nm process and up to 2.8GHz clockspeeds while we’re talking here about something made in TSMC’s 16nm FFC process, officially limiting max cpufreq to 2.4GHz and obviously prioritizing the GPU (larger die, most probably also preferred wrt thermal budget) which IMO makes sense on a smartphone SoC like this. Just try the sysbench tests running only on 1, 2 or 3 big cores and we know a little bit more.
BTW: When I see for an example a Marvell ARMADA 8040 server/router/NAS SoC containing 4xA72 ‘up to 2 GHz’ then I know that this SoC is made for heavy CPU loads running on 4 cores simultaneously at 2GHz if heat dissipations matches (large heatsink + fan). I would never expect the same from phone or tablet SoCs since these designs are made for something completely else and usually the first hurdle is understanding/optimizing the thermal budget stuff.
It’s fun over there interpreting the numbers.
just for reference, m][sko has found a 4.9-based kernel which solve his performance issue.
all numbers are much higher. results are on 96boards forum.
Huh? ‘Solved’? The kernel he tried in between had broken cpufreq scaling, now numbers look better but IMO still far away from ‘solved’. Usually coming up with performant settings takes a lot more time (and needs intensive monitoring as suggested over there, especially if you’re dealing with a design that is prone to overheating)