The team at Linaro has done an amazing job at optimizing Android 4.0 for ARM and Bernhard Rosenkränzer, Android Engineer at Linaro among other things, has put all those optimizations together to showcase a demo at Linaro Connect Q2.2012 in Hong Kong with 2 pandaboards:
- One running Stock Android 4.0.4, the one released by Google (AOSP)
- One running Linaro Android 4.0.4
Both hardware, android version and benchmark software (oxBench) are the same, and the results are quite amazing with Android Linaro achieving about 60 fps in all 0xBenchmark tests (OpenGL Cube, OpenGL Blending, OpenGL Fog and Flying Teapot) whereas Android stock achieving 30 fps. They selected a benchmark tool that is mainly CPU bound, as they cannot optimize the GPU code since they can only access binary blobs.
Apparently, most of the improvements were possible thanks to toolchain and code optimization (to be able to build) such as using gcc 4.7 and building Android ICS without -fno-strict-aliasing and with -O3 compiler flag (first released in Linaro 12.01). So that means for this particular benchmark, they achieved to double Android performance just by “tweaking” the software. [Important update: A detailed analysis shows the benchmark is somewhat flawed, and it’s a VSync thing that makes Android appears to be twice as fast. In reality the improvements are in the 20 to 30% range, which is still very good with just software optimizations. Bero also notes that 100% speed improvement may still happen in real apps such as 3D games that may also wait for VSync to refresh the screen].
I know you don’t believe me… So have a look at the video below. 🙂 (Source: Charbax)
[Update: If you want to try it yourself and access the toolchain and source, please see Bernhard Rosenkraenzer (Bero) comment below with all the information you need.]
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
30 Replies to “Linaro Android Puts Stock Android To Shame on TI Pandaboard (OMAP4430)”
So, if this is true and Bero publicizes what he has done, can the CyanogenMod people start using the method and make the CM on my phone twice as fast?
If it is true, I guess Google will be interested too in his method and in hiring Bero?
I say “if this is true”, because “twice as fast” is quite a claim.
You can take a look at the video, and even download the OS images we’ve used to do the benchmark.
Obviously saying we’ve made it “twice as fast” is a bit of an oversimplification.
This particular benchmark (the 3D benchmark included in 0xbench) runs twice as fast on this particular hardware. Other benchmarks (e.g. Sunspider) are “merely” 30% faster, some others are only slightly faster (e.g. GLMark2 – as it’s mostly GPU bound), and it would be possible to craft a benchmark showing that our build is 10 times faster (write a benchmark that uses strcpy, memset and friends heavily, which I’ve actually done, not to show off but to test if our changes are as beneficial as we’re hoping).
CM people can obviously pick up what we’ve done – so can Google, we’ll be submitting the changes to AOSP shortly.
The Linaro toolchain has been out in the open forever, Linux binaries are here, sources are here.
Linaro’s Android builds (that have all the changes required to make use of the Linaro toolchain efficiently) have been around forever, here.
The kernel source we’re using is available here.
The only thing we did for this demo that isn’t in the open yet is a rewrite of the string routines in Bionic, based on Linaro’s Cortex Strings library.
This will be added to our builds shortly, the only reason it isn’t there is that I had to do a bit of sloppy coding in order to get things done for the demo (merging the patchset as it is right now would give Cortex A9 all the improvements shown in the demo, but it would leave ARMv5 users with an unbootable build — but we wanted to show something at the end of the Linaro Connect event). The patchset will go in as soon as I’ve cleaned it up a bit more.
Some press coverage puts me a bit too much at the center of things – it’s not like I single-handedly made things twice as fast. This is the work of Linaro as a whole, I’m just the one who put the various bits and pieces together, merging them all into one usable build. Google would be much better off working closer with Linaro than attempting to hire me or anyone of the others involved off.
@ Bernhard Rosenkraenzer (Bero)
Thank for the detailed comment. I’ve also updated the wording in my post to explicitly say it was a team effort, rather than a one man effort.
Will you make the optimization open source, and allow Google to use it in future versions of Android?
You can read Bero comment above.
Bero and his teams certainly deserve an epic thumbs up for this work! Thank you kind sirs for fighting the good fight and making our phones faster. Hope who needs to see/hear this, will.
“its -O3 the letter, not -03 the number”
Yes, it already like this in the post, but maybe the font used in this WordPress theme can lead to confusion between 0 and O.
Will this optimisation be applicable to only TI OPMAP porocessors or any ARm device, I mean Qualcomm Snapdragon S2/S3/S4?
Yes,, these optimizations are for Cortex A processors (ARMv7 architecture), so they should also work for other ARMv7 processors, although the benchmark results might differ between SoC.
Yes, that’s a certainty. The CM team can make the necessary improvements in their current CM9 versions even before Google decides to update it (that is, if they choose to optimize their version with this one)
This is awesome! Good hack! 🙂 I can’t wait to run this on my Galaxy nexus
This was a subtle reference to some Gentoo humor from a few years ago.
Will this be possible on the terribly slow and buggy LG optimus 2X (dual) with tegra 2?
Tegra 2 is a dual cortex A9, so it’s technically possible. LG won’t provide a firmware update, but you should be able to install CM9 as I’m pretty sure those optimizations will end up in CyanogenMod.
http://thiemonagel.de/2010/01/no-strict-aliasing/ explains why you might NOT want this ‘improvement’. I admit to no real knowledge of any of this, except when I heard it went faster I automatically expected to be able to find on Google an instance of why security or quality of code would argue against this
@ Bernhard Rosenkraenzer (Bero)
You guys are amazing.
@ Dennis Farr
Yes, it’s safer with -fno-strict-aliasing, but Linaro fixed the code that may cause those issues, and they seem to have a pretty good QA.
Those optimizations have already been ported to CM9, it seems they just need approval now.
Isn’t this like a kick in the head for aosp dev??
aosp dev should hire this team
It’s just that AOSP team is focusing on other parts. Linaro has several teams including a team dedicated to the ARM toolchain and an ARM Android team. The no-strict aliasing “bug” had been filled since last year (August last year), and since then the guys at Linaro have brought some improvements. I’m pretty sure the AOSP team was aware of this work, and knew they did not need to work on it themselves.
great news. will this be possible on Galaxy S (Hummingbird, Cortex-A8)?
It’s technically possible on all Cortex-A devices.
This great performance!
even after clicking “mobile off” at bottom of the page, it still do not switch to destop site.
I’m on dell streak official android 2.2.2 , tried all available browser’s.
Pls fix this issue.
Thanks & Regards
* I’ve never had this problem with charbax’s blog.
I can also see the issue on my Android Tablet. It works as a logged-in user, but fails with normal visitors. This must be a caching issue. I’ll try to find time to have a look at it this week.
I’ve updated the post following this Google+ post https://plus.google.com/111881818033837980275/posts/4TZtUSQijkr, that explains why the benchmark results are what they are, although there is usually a 20 to 30% improvement with other benchmarks.
I’ve disabled caching for mobile devices, you can try it. It works on my Android tablet with Opera Mini.