ARM DynamIQ Improves on big.LITTLE Technology, Supports Up to 8 Heterogeneous Cores in a Single Cluster

ARM unveiled big.LITTLE technology in 2011 which consisted of clusters of low power cores such as Cortex A7 or A53, and high performance cores such as Cortex A15 or A72, with the system assigning tasks to the best processor for the job in order to  optimize battery life. big.LITTLE supports up to 4 cores per cluster, and you can mix different types of cores within a single cluster. ARM DynamIQ changes all that as up to 8 cores are supported within one cluster, and you can mix low power and high performance cores within a given cluster.

ARM DynamIQ multicore microarchitecture will be available for all new ARM Cortex-A processors starting this year, and targets automotive, networking, server, and primary compute devices. ARM claims it’s especially advantageous for artificial intelligence due to better performance, and autonomous driving due to increased safety, and it allows for much faster response from accelerators. Based on the slide below showing the evolution of multi-core implementation with ARM SoC, the company might as well as called it ARM’s Just Do What You WantTM multi-core technology, especially as they explain that any configuration is possible such as 1+7 (1x big and 7x LITTLE CPUs), 2+4, 1+3 etc…
One advantage of using multiple heterogeneous cores within a single cluster is that it’s less “expensive” to migrate tasks from a LITTLE processor to a big processor, with increased efficiency as processors share the same memory (and cache?), and most big.LITTLE implementations going forward are likely to use a single cluster design, unless you need more than 8 cores.

You may find more details on DynamIQ technology page, ARM’s community related blog post, and the presentation slides.

Support CNX Software - Donate via PayPal or become a Patron on Patreon

13
Leave a Reply

avatar
10 Comment threads
3 Thread replies
5 Followers
 
Most reacted comment
Hottest comment thread
6 Comment authors
GanjaBearcnxsoftwjTheguyukblu Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Shomari
Guest

That’s the limit I ran into when making the A53 RK3368 big.LITTLE implementation hack. I’m able to run all cores simultaneously at 1608mhz, but there are losses due to the existent biglittle cluster migration due to the four core limit. I negated as much of the inefficiencies as possible using hmp tunables, but I would’ve been better off had those cluster limits not existed for a53.

EDIT: images never work for me here

http://i.imgur.com/uNAV2lp.jpg

http://i.imgur.com/i0SpXMf.jpg

RK3368 – OS 5.1.1

blu
Guest
blu

@Shomari
What voltages did you use for the LITTLE cluster? While the big cluster on the 3368 should have not issues at that clock, keeping the LITTLE cluster at 1.608GHz is not advisable, as it’s a low-clock-tuned design.

Shomari
Guest

No problems at all for the past ~ 7 months on two test boxes.

All cores voltage max at 1.4

As far as I have surmised, the cores are all identical bar cache sizes, most significantly.

I only ventured into this when I noticed RK’s stock reference voltages were scaling across ALL cores at the same levels, with some configurations taking the LITLLE cores higher than the big. Can’t go by what’s in the tables, because RK’s dvfs/throttling methods combine to make some weird things happen on live system.

EDIT: I’ve put out firmware with these hacks for the past 7 months and haven’t had any reports of stability/heat issues. I’ve long theorized RK’s throttling implementation on a53 caused more problems than it helped. I turned it off as robustly as I could, purposely.

Theguyuk
Guest
Theguyuk

I am sure I read awhile back that the getting near the limits of Moore’s law was behind the growth in multi core adoption. Also that running on performance governor can choke the cores in some implementations. I have no real world experience of it other than tweaking Android TV box ROMs using kernel adiutor ( this can brick your TV box !, always have, and know how to, re- flash box firmware )

wj
Guest

I always consider big.little as a silly designation. A cpu core should be able to run at high performance with higher energy consumption, as well as low performance with lower energy consumption.

blu
Guest
blu

@Shomari
The cores in the two clusters are not identical – their operating points are different, and the voltages of the same-clock operating points are different, at least according to the vendor-supplied dvfs. What stress tests have run on the so-clocked 3368?

Shomari
Guest

If I was sticking to vendor supplied convention, I would’ve left everything reference and stock. The idea is to get the most performance while maintaining stability. After seven months I still see no issues with daily usage. Haven’t seen any reported among thousands of downloads, we’d have heard of degradation/instability by now. The only feedback has been typical, nothing out of the ordinary.

What stress tests do you recommend?

Theguyuk
Guest
Theguyuk

I read about Overclocking A20 Allwinner chips and came across this

Quote ”

“Overclocking”

As per the definition of the word “overclocking” is not possible since that would mean “frequency scaling beyond the specs”. The problem with cheap ARM SoCs is that they’re not subject to an expensive QA/selection process where each SoC will be tested extensively and then labeled/sold in different categories depending on the upper limits it’s able to achieve (this is standard with x86 for example: chips from the same wafer will be tested individually and be sold for a few hundred bucks more or less depending on how many CPU/GPU cores work reliable at which clock speeds).
One A20 SoC from a specific production batch might work reliable with 1.2GHz (which is rather unrealistic) while another starts to throw errors with 1.0GHz. The default cpufreq upper limits take that into account and can be considered sane/safe defaults. And while it’s possible to adjust or define additional operating-points in the kernel sources to gain some more speed (with increased voltage) this should be considered experimental and is only advisable when you the user also does QA/selection on your own: Stresstesting the device over many hours to simulate worst case conditions while keeping an eye on heat dissipation: Without appropriate airflow increasing cpufreq settings above defaults won’t work.

Can someone tell me if the error clock margin in Arm SoC is still the same today as the A20 Soc is reported above?

blu
Guest
blu

@Shomari
For a proper linux, cpuburn is a good stress test. I’m not an android user so I cannot recommend something similar for android – generally, android users are more interested in benchmarks than in burn-in tests.

shomari
Guest
shomari

… yeah, there’s nothing (except for proprietary enterprise packages) to stress test live Android systems.

GanjaBear
Guest
GanjaBear

@shomari Have you tried compiling cpuburn under termux? Should work fine and prove a quick and straightforward test.

I’ll have to give it a go myself 🙂

GanjaBear
Guest
GanjaBear

**edit:**
Both krait and A53 versions don’t need any porting to clang so yeah, cpuburn can be used out of the box on Android, provided you have termux installed and install a C compiler with apt install clang

You can also try c-ray for some numbers.