Big.LITTLE Processing Implementations and Current Status
There was a big,LITTLE mini-summit during Linaro Connect Europe 2012, where an update was given on current big.LITTLE implementations and the results of measurement of power vs performance.
As briefly mentioned in “Versatile Express TC2 (2xA15, 3xA7) Development Board at ARM Techcon 2012“, there are 2 big.LITTLE implementations:
- In-kernel switcher (IKS)
This implementation is already available through Linaro and only required minimal changed to the kernel as it mainly an augmentation to DVFS (Dynamic Voltage and Frequency Scaling) except instead of only adjusting voltage and frequency depending on the load, it will also move the load to different cores. The main drawback is that this implementation only uses half the cores. For example, on a 2x Cortex A15 / 2x Cortex A7 system, it can only use 2 cores at the same time (either A15 or A7 cores), as the load is managed between one type of core to the other depending on the load.
- Heterogeneous MultiProcessing (HMP)
This implementation uses all available cores, however this requires major changes to the kernel, and a basic implementation will only be available to in Q1 2013, with optimization and upstreaming taking several more months. Instead of using system load, HMP can track individual tasks load and distribute the tasks to the best cores for the job.
For simple tasks such as audio decoding, IKS works perfectly and the task is fully run on Cortex A7 core, providing nearly the same power consumption as a single A7 processor, and 70% power consumption savings compared to a Cortex A15 core.
For more complex tasks, such as simultaneously browsing webpages (BBench) and listening to music, IKS provides around 90% of the performance provides by Cortex A15, but consumes between 30 to 40% less power.
There are two IKS implementation in this chart:
- The original IKS with one “cut-off” frequency (go_hispeed_load) to switch between Cortex A7 and Cortex A15
- IKS_HS2 implementation with one extra “cut-off” frequency (go_hispeed_load2) to limit the frequency on Cortex A15 core, since power consumption really shoots up over 1 GHz (overdrive).
Linaro has also started work on the second implementation, and they have an experimental implementation of HMP that treats big and LITTLE CPUs as separate scheduling domains, uses PJT (Paul Turner)’s load-tracking patches to track individual task load and migrates tasks between the big and the LITTLE domains based on task load.
MP3 playback power benchmarking has found that HMP uses 39.86% of the power required by this task on Cortex A15 compared to only 30.79% on a processor with Cortex A7 cores.
Cortex A15 cores consume the extra power. Although there is no user task running on the A15s, unwarranted wake-ups (tick_sched_timer , Timers, workqueue…) occur on those cores. This will be resolved by implementing CPU wakeup prioritization in order to pick the “cheapest” cpu. Other improvement to the HMP will include global balancing (Spread load to A7s when A15s are overloaded) and the implementation of cluster aware cpufreq governors.
If you want to know more about big.LITTLE implementation, you can read two other technical presentations at LCE12: Bluesky: What would the ideal power-aware kernel do? & Handling big.LITTLE Core and Cluster Shutdowns on ARM, and follow the work done on Linaro Big Little Switcher page.