Compile with ARM Thumb2 to Reduce Memory Footprint and Improve Performance

ARM claims that Thumb-2 instructions (for ARM Cortex cores and all ARMv7 processors) provides performance improvements and code size optimization:

Thumb-2 technology is the instruction set underlying the ARM Cortex architecture which provides enhanced levels of performance, energy efficiency, and code density for a wide range of embedded applications.

For performance optimized code Thumb-2 technology uses 31 percent less memory to reduce system cost, while providing up to 38 percent higher performance than existing high density code, which can be used to prolong battery-life or to enrich the product feature set. Thumb-2 technology is featured in the  processor, and in all ARMv7 architecture-based processors.

Dave Martin (Linaro) has recently posted a message entitled “ARM/Thumb-2 kernel size comparison” on Linaro mailing list:


The results provided by Linaro at not as high as those claimed by ARM, but a 20% code size reduction is still impressive.

If you want to use Thumb2 to compile your applications for Cortex A8/A9 core with GCC,export the following:

export CFLAGS=”-mthumb -march=armv7-a”

You may also add -mtune=cortex-a8 or -mtune=cortex-a9 depending on your core.

Linaro team also ran Coremark, an embedded systems benchmark, with different compilation option including arm, armv7-a, thumb and thumb-2  in January 2011 on an 1 GHz processor featuring a cortex-A9 core.

The best options  for  armv7-a, thumb-2 and thumb-1 and overall:

  • The best is -O3 -funroll-loops -marm -march=armv5te -mtune=cortex-a8
  • The best armv7-a is -O3 -funroll-loops -marm -march=armv7-a -mtune=cortex-a8 at 95.2 % of overall best
  • The best Thumb-2 is -O3 -funroll-loops -mthumb -march=armv7-a -mtune=cortex-a8 at 88.7% of overall best
  • The best Thumb-1 is -O2 -mthumb -march=armv5te -mtune=cortex-a8 at 64.4% of overall best

Thumb-1 code is slower but that should be expect as it focus on code size optimization. Thumb2 code should yield similar or even faster result than armv5 code, but I suppose that’s because they are still optimizing the code / compiler and later on thumb-2 will be faster.

See the full details at Linaro Coremark Run.

Share this:
FacebookTwitterHacker NewsSlashdotRedditLinkedInPinterestFlipboardMeWeLineEmailShare

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK Pi 4C Plus

Leave a Reply

Your email address will not be published. Required fields are marked *

Khadas VIM4 SBC
Khadas VIM4 SBC