Compile with ARM Thumb2 to Reduce Memory Footprint and Improve Performance

ARM claims that Thumb-2 instructions (for ARM Cortex cores and all ARMv7 processors) provides performance improvements and code size optimization:

Thumb-2 technology is the instruction set underlying the ARM Cortex architecture which provides enhanced levels of performance, energy efficiency, and code density for a wide range of embedded applications.

For performance optimized code Thumb-2 technology uses 31 percent less memory to reduce system cost, while providing up to 38 percent higher performance than existing high density code, which can be used to prolong battery-life or to enrich the product feature set. Thumb-2 technology is featured in the processor, and in all ARMv7 architecture-based processors.

Dave Martin (Linaro) has recently posted a message entitled “ARM/Thumb-2 kernel size comparison” on Linaro mailing list:

The question of the size impact of building the kernel in Thumb-2 came
up to day, so I extracted some quick numbers:

$ size vmlinux-*
text	   data	    bss	    dec	    hex	filename
8420507	 463356	 826928	9710791	 942cc7	vmlinux-arm
6715539	 463260	 826928	8005727	 7a285f	vmlinux-thumb2

This is for a recent mainline kernel built with the linaro omap config.

In this case we save about 20% for code and read-only data (i.e.,
text) and 17.5% overall -- which accounts for a little under 2MB saved
in this example.

This doesn't take loadable modules into account; we can probably expect
to see a similar size ratio there.

The question of the size impact of building the kernel in Thumb-2 came

up to day, so I extracted some quick numbers:

$ size vmlinux-*

text data bss dec hex filename

8420507 463356 826928 9710791 942cc7 vmlinux-arm

6715539 463260 826928 8005727 7a285f vmlinux-thumb2

This is for a recent mainline kernel built with the linaro omap config.

In this case we save about 20% for code and read-only data (i.e.,

text) and 17.5% overall -- which accounts for a little under 2MB saved

in this example.

This doesn't take loadable modules into account; we can probably expect

to see a similar size ratio there.

The results provided by Linaro at not as high as those claimed by ARM, but a 20% code size reduction is still impressive.

If you want to use Thumb2 to compile your applications for Cortex A8/A9 core with GCC,export the following:

export CFLAGS=”-mthumb -march=armv7-a”

You may also add -mtune=cortex-a8 or -mtune=cortex-a9 depending on your core.

Linaro team also ran Coremark, an embedded systems benchmark, with different compilation option including arm, armv7-a, thumb and thumb-2 in January 2011 on an 1 GHz processor featuring a cortex-A9 core.

The best options for armv7-a, thumb-2 and thumb-1 and overall:

The best is -O3 -funroll-loops -marm -march=armv5te -mtune=cortex-a8
The best armv7-a is -O3 -funroll-loops -marm -march=armv7-a -mtune=cortex-a8 at 95.2 % of overall best
The best Thumb-2 is -O3 -funroll-loops -mthumb -march=armv7-a -mtune=cortex-a8 at 88.7% of overall best
The best Thumb-1 is -O2 -mthumb -march=armv5te -mtune=cortex-a8 at 64.4% of overall best

Thumb-1 code is slower but that should be expect as it focus on code size optimization. Thumb2 code should yield similar or even faster result than armv5 code, but I suppose that’s because they are still optimizing the code / compiler and later on thumb-2 will be faster.

See the full details at Linaro Coremark Run.

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.