Compile with ARM Thumb2 to Reduce Memory Footprint and Improve Performance
ARM claims that Thumb-2 instructions (for ARM Cortex cores and all ARMv7 processors) provides performance improvements and code size optimization:
Thumb-2 technology is the instruction set underlying the ARM Cortex architecture which provides enhanced levels of performance, energy efficiency, and code density for a wide range of embedded applications.
For performance optimized code Thumb-2 technology uses 31 percent less memory to reduce system cost, while providing up to 38 percent higher performance than existing high density code, which can be used to prolong battery-life or to enrich the product feature set. Thumb-2 technology is featured in the processor, and in all ARMv7 architecture-based processors.
Dave Martin (Linaro) has recently posted a message entitled “ARM/Thumb-2 kernel size comparison” on Linaro mailing list:
The question of the size impact of building the kernel in Thumb-2 came
up to day, so I extracted some quick numbers:
$ size vmlinux-*
text data bss dec hex filename
8420507 463356 826928 9710791 942cc7 vmlinux-arm
6715539 463260 826928 8005727 7a285f vmlinux-thumb2
This is for a recent mainline kernel built with the linaro omap config.
In this case we save about 20% for code and read-only data (i.e.,
text) and 17.5% overall -- which accounts for a little under 2MB saved
in this example.
This doesn't take loadable modules into account; we can probably expect
to see a similar size ratio there.
The results provided by Linaro at not as high as those claimed by ARM, but a 20% code size reduction is still impressive.
If you want to use Thumb2 to compile your applications for Cortex A8/A9 core with GCC,export the following:
export CFLAGS=”-mthumb -march=armv7-a”
You may also add -mtune=cortex-a8 or -mtune=cortex-a9 depending on your core.
Linaro team also ran Coremark, an embedded systems benchmark, with different compilation option including arm, armv7-a, thumb and thumb-2 in January 2011 on an 1 GHz processor featuring a cortex-A9 core.
The best options for armv7-a, thumb-2 and thumb-1 and overall:
- The best is -O3 -funroll-loops -marm -march=armv5te -mtune=cortex-a8
- The best armv7-a is -O3 -funroll-loops -marm -march=armv7-a -mtune=cortex-a8 at 95.2 % of overall best
- The best Thumb-2 is -O3 -funroll-loops -mthumb -march=armv7-a -mtune=cortex-a8 at 88.7% of overall best
- The best Thumb-1 is -O2 -mthumb -march=armv5te -mtune=cortex-a8 at 64.4% of overall best
Thumb-1 code is slower but that should be expect as it focus on code size optimization. Thumb2 code should yield similar or even faster result than armv5 code, but I suppose that’s because they are still optimizing the code / compiler and later on thumb-2 will be faster.
See the full details at Linaro Coremark Run.