Archive

Posts Tagged ‘neon’

ARM Releases Ne10: An Open Source Library with NEON Optimized Functions

March 29th, 2012 No comments

The Advanced SIMD extension (aka NEON or “MPE” Media Processing Engine) is a combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardized acceleration for media and signal processing applications for ARM Cortex A (ARMv7) processors and the goal of these instructions is similar to MMX, SSE and 3DNow! extensions for x86 processors.

Starting early 2011, ARM has been working internally on a project codenamed Snappy to develop common functions accelerated by NEON. They have now released the first version of Snappy, now called the Ne10 library, which is available on GitHub at https://github.com/projectNe10/Ne10 .

The code has been developed in C and Assembler and tested on Ubuntu on ARM (Linaro). A Makefile is also included to build it for Android (AOSP). The current functions include vector and matrix operations accelerated by NEON instructions.

Since the library is open source, ARM hopes developers to make use of the Ne10 library in their open source packages, add new functions and port the Ne10 libraries to other operating systems.

In the video below Rod Crawford, Principal Engineer at ARM, explains why they started the NE10 Project, what can be done with it, and what’s next for the project.

If you would like to contribute to the project, you can join the community at www.ProjectNe10.org.

If you are not familiar with NEON or just want to improve your skills, you can check the ARM NEON Tutorial in C and Assembler and/or read the 5 parts Coding for NEON on ARM blog.

Finally, you may also want to see the performance improvement brought by NEON instructions on a real project (JPEG Decoding) by reading “Faster JPEG decoding on ARM with libjpeg-turbo and NEON Instructions” blog post.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

ARM NEON Tutorial in C and Assembler

November 27th, 2011 No comments

The Advanced SIMD extension (aka NEON or “MPE” Media Processing Engine) is a combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardized acceleration for media and signal processing applications similar to MMX, SSE and 3DNow! extensions found in x86 processors.

Doulos has a video tutorial showing how you can exploit NEON instructions in assembler, how to modify your C code and provides the compile options for gcc to enable NEON during the build.

Abstract:
With the v7-A architecture, ARM has introduced a powerful SIMD implementation called NEON™. NEON is a coprocessor which comes with its own instruction set for vector operations. While NEON instructions could be hand coded in assembler language, ideally we want our compiler to generate them for us. Automatic analysis whether an iterative algorithm can be mapped to parallel vector operations is not trivial not the least because the C language is lacking constructs necessary to support this. This paper explains how the RealView compiler tools (RVCT) and other modern compilers use a blend of sophisticated analysis techniques and language extensions to fulfill their job.

You can download the whiter paper at http://www.doulos.com/knowhow/arm/using_your_c_compiler_to_exploit_neon/Resources/index.php (registration required).

Here’s how to enable NEON instructions (with auto vectorization) for ARM gcc cross-compiler:

arm-none-gnueabi-gcc –mfpu=neon -ftree-vectorize -c sample.c

and armcc compiler:

armcc –cpu=Cortex-A9 -O3 -Otime –vectorize –remarks -c fir_neon.c

I recommand you watch the 17 minutes video tutorial as it explains how to modify your C code to take advantage of NEON instructions with a FIR filter.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter

Faster JPEG decoding on ARM with libjpeg-turbo and NEON Instructions

September 13th, 2011 No comments

libjpeg-turbo is based on libjpeg, but uses SIMD instructions (MMX, SSE2, etc.) to accelerate JPEG compression and decompression on x86 targets. On such systems, libjpeg-turbo is generally 2-4x as fast as the original version of libjpeg with the same hardware.

ARM does not support MMX or SSE2 instructions, but it has its own SIMD instructions processed by the NEON Engine on ARM Cortex Core A5, A8, A9 and A15. ARM claims that “NEON technology can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis by at least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD.”

Linaro worked on libjpeg-turbo and added NEON support to it.

The code is available on launchpad at https://code.launchpad.net/~tom-gall/linaro/libjpeg-turbo

Linaro has also provide benchmark result for libjpeg-turbo with a 12 Mpixel image on TI OMAP4 (Pandaboard) using the command:

djpeg 12mp.jpeg > /dev/null

Non Optimized libjpeg-turbo(5 runs): 2078 ms (average)
Linaro’s Optimized libjpeg-turbo (5 runs):  1676 ms (average)

That represents almost 20% improvement between the non-optimized libjpeg-turbo library and the one for ARM NEON optimization by Linaro.

For further information about Linaro’s libjpeg-turbo optimization go to Optimize JPEG Decoding for ARM page. If you are interested in optimizing your code for NEON instruction, you can visit Optimizing Code for ARM Cortex-A8 with NEON SIMD, check the list of NEON C functionsavailable when NEON is enabled (
CFLAGS += -mfpu=neon

) and/ or read ARM NEON™ Instruction Set and Why You Should Care slides presented at ELC 2011 by Mike Anderson, Chief Scientist at PTR Group.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Categories: Graphics, Programming Tags: arm, cortex, jpeg, linaro, neon