Home > Linux, Linux 2.6, Programming > Compile with ARM Thumb2 to Reduce Memory Footprint and Improve Performance

Compile with ARM Thumb2 to Reduce Memory Footprint and Improve Performance

ARM claims that Thumb-2 instructions (for ARM Cortex cores and all ARMv7 processors) provides performance improvements and code size optimization:

Thumb-2 technology is the instruction set underlying the ARM Cortex architecture which provides enhanced levels of performance, energy efficiency, and code density for a wide range of embedded applications.

For performance optimized code Thumb-2 technology uses 31 percent less memory to reduce system cost, while providing up to 38 percent higher performance than existing high density code, which can be used to prolong battery-life or to enrich the product feature set. Thumb-2 technology is featured in the  processor, and in all ARMv7 architecture-based processors.

Dave Martin (Linaro) has recently posted a message entitled “ARM/Thumb-2 kernel size comparison” on Linaro mailing list:

The question of the size impact of building the kernel in Thumb-2 came
up to day, so I extracted some quick numbers:

$ size vmlinux-*
text	   data	    bss	    dec	    hex	filename
8420507	 463356	 826928	9710791	 942cc7	vmlinux-arm
6715539	 463260	 826928	8005727	 7a285f	vmlinux-thumb2

This is for a recent mainline kernel built with the linaro omap config.

In this case we save about 20% for code and read-only data (i.e.,
text) and 17.5% overall -- which accounts for a little under 2MB saved
in this example.

This doesn't take loadable modules into account; we can probably expect
to see a similar size ratio there.

The results provided by Linaro at not as high as those claimed by ARM, but a 20% code size reduction is still impressive.

If you want to use Thumb2 to compile your applications for Cortex A8/A9 core with GCC,export the following:

export CFLAGS=”-mthumb -march=armv7-a”

You may also add -mtune=cortex-a8 or -mtune=cortex-a9 depending on your core.

Linaro team also ran Coremark, an embedded systems benchmark, with different compilation option including arm, armv7-a, thumb and thumb-2  in January 2011 on an 1 GHz processor featuring a cortex-A9 core.

The best options  for  armv7-a, thumb-2 and thumb-1 and overall:

  • The best is -O3 -funroll-loops -marm -march=armv5te -mtune=cortex-a8
  • The best armv7-a is -O3 -funroll-loops -marm -march=armv7-a -mtune=cortex-a8 at 95.2 % of overall best
  • The best Thumb-2 is -O3 -funroll-loops -mthumb -march=armv7-a -mtune=cortex-a8 at 88.7% of overall best
  • The best Thumb-1 is -O2 -mthumb -march=armv5te -mtune=cortex-a8 at 64.4% of overall best

Thumb-1 code is slower but that should be expect as it focus on code size optimization. Thumb2 code should yield similar or even faster result than armv5 code, but I suppose that’s because they are still optimizing the code / compiler and later on thumb-2 will be faster.

See the full details at Linaro Coremark Run.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

  1. No comments yet.
  1. No trackbacks yet.