March 21, 2017March 21, 2017 by Jean-Luc Aufranc (CNXSoft) - 13 Comments

ARM DynamIQ Improves on big.LITTLE Technology, Supports Up to 8 Heterogeneous Cores in a Single Cluster

ARM unveiled big.LITTLE technology in 2011 which consisted of clusters of low power cores such as Cortex A7 or A53, and high performance cores such as Cortex A15 or A72, with the system assigning tasks to the best processor for the job in order to optimize battery life. big.LITTLE supports up to 4 cores per cluster, and you can mix different types of cores within a single cluster. ARM DynamIQ changes all that as up to 8 cores are supported within one cluster, and you can mix low power and high performance cores within a given cluster.

ARM DynamIQ multicore microarchitecture will be available for all new ARM Cortex-A processors starting this year, and targets automotive, networking, server, and primary compute devices. ARM claims it’s especially advantageous for artificial intelligence due to better performance, and autonomous driving due to increased safety, and it allows for much faster response from accelerators. Based on the slide below showing the evolution of multi-core implementation with ARM SoC, the company might as well as called it ARM’s Just Do What You Want^TM multi-core technology, especially as they explain that any configuration is possible such as 1+7 (1x big and 7x LITTLE CPUs), 2+4, 1+3 etc…
One advantage of using multiple heterogeneous cores within a single cluster is that it’s less “expensive” to migrate tasks from a LITTLE processor to a big processor, with increased efficiency as processors share the same memory (and cache?), and most big.LITTLE implementations going forward are likely to use a single cluster design, unless you need more than 8 cores.

You may find more details on DynamIQ technology page, ARM’s community related blog post, and the presentation slides.

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Name*

Email*

Website

I agree to the Privacy Policy

The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.

Name*

Email*

Website

I agree to the Privacy Policy

The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.

13 Comments

oldest

newest

Shomari

7 years ago

That’s the limit I ran into when making the A53 RK3368 big.LITTLE implementation hack. I’m able to run all cores simultaneously at 1608mhz, but there are losses due to the existent biglittle cluster migration due to the four core limit. I negated as much of the inefficiencies as possible using hmp tunables, but I would’ve been better off had those cluster limits not existed for a53.

EDIT: images never work for me here

http://i.imgur.com/uNAV2lp.jpg

http://i.imgur.com/i0SpXMf.jpg

RK3368 – OS 5.1.1

blu

7 years ago

@Shomari
What voltages did you use for the LITTLE cluster? While the big cluster on the 3368 should have not issues at that clock, keeping the LITTLE cluster at 1.608GHz is not advisable, as it’s a low-clock-tuned design.

Shomari

7 years ago

blu

No problems at all for the past ~ 7 months on two test boxes. All cores voltage max at 1.4 As far as I have surmised, the cores are all identical bar cache sizes, most significantly. I only ventured into this when I noticed RK’s stock reference voltages were scaling across ALL cores at the same levels, with some configurations taking the LITLLE cores higher than the big. Can’t go by what’s in the tables, because RK’s dvfs/throttling methods combine to make some weird things happen on live system. EDIT: I’ve put out firmware with these hacks for the past… Read more »

Theguyuk

7 years ago

I am sure I read awhile back that the getting near the limits of Moore’s law was behind the growth in multi core adoption. Also that running on performance governor can choke the cores in some implementations. I have no real world experience of it other than tweaking Android TV box ROMs using kernel adiutor ( this can brick your TV box !, always have, and know how to, re- flash box firmware )

7 years ago

I always consider big.little as a silly designation. A cpu core should be able to run at high performance with higher energy consumption, as well as low performance with lower energy consumption.

Author

cnxsoft

7 years ago

@wj
I’m not an IC designer, but I think your premise is not possible, and there are different optimizations for high performance or low power core. Otherwise, manufacturers would just be able to put an Intel Core i7 processor in our phones running at low power on the go, and at full power when docked.

blu

7 years ago

@Shomari
The cores in the two clusters are not identical – their operating points are different, and the voltages of the same-clock operating points are different, at least according to the vendor-supplied dvfs. What stress tests have run on the so-clocked 3368?

Shomari

7 years ago

blu

If I was sticking to vendor supplied convention, I would’ve left everything reference and stock. The idea is to get the most performance while maintaining stability. After seven months I still see no issues with daily usage. Haven’t seen any reported among thousands of downloads, we’d have heard of degradation/instability by now. The only feedback has been typical, nothing out of the ordinary.

What stress tests do you recommend?

Theguyuk

7 years ago

I read about Overclocking A20 Allwinner chips and came across this Quote ” “Overclocking” As per the definition of the word “overclocking” is not possible since that would mean “frequency scaling beyond the specs”. The problem with cheap ARM SoCs is that they’re not subject to an expensive QA/selection process where each SoC will be tested extensively and then labeled/sold in different categories depending on the upper limits it’s able to achieve (this is standard with x86 for example: chips from the same wafer will be tested individually and be sold for a few hundred bucks more or less depending… Read more »

blu

7 years ago

@Shomari
For a proper linux, cpuburn is a good stress test. I’m not an android user so I cannot recommend something similar for android – generally, android users are more interested in benchmarks than in burn-in tests.

shomari

7 years ago

blu

… yeah, there’s nothing (except for proprietary enterprise packages) to stress test live Android systems.

GanjaBear

7 years ago

@shomari Have you tried compiling cpuburn under termux? Should work fine and prove a quick and straightforward test.

I’ll have to give it a go myself 🙂

GanjaBear

7 years ago

**edit:**
Both krait and A53 versions don’t need any porting to clang so yeah, cpuburn can be used out of the box on Android, provided you have termux installed and install a C compiler with apt install clang

You can also try c-ray for some numbers.