ARM NEON Tutorial in C and Assembler

The Advanced SIMD extension (aka NEON or “MPE” Media Processing Engine) is a combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardized acceleration for media and signal processing applications similar to MMX, SSE and 3DNow! extensions found in x86 processors. Doulos has a video tutorial showing how you can exploit NEON instructions in assembler, how to modify your C code and provides the compile options for gcc to enable NEON during the build. Abstract: With the v7-A architecture, ARM has introduced a powerful SIMD implementation called NEON™. NEON is a coprocessor which comes with its own instruction set for vector operations. While NEON instructions could be hand coded in assembler language, ideally we want our compiler to generate them for us. Automatic analysis whether an iterative algorithm can be mapped to parallel vector operations is not trivial not the least because the C language is […]

Midgard architecture for Embedded GPUs (Mali-T604 / Mali T658)

I’ve attended a webinar entitled “Harness the power and flexibility of the Midgard architecture for Embedded GPUs” presented Steve Steele, Product Manager at ARM Media Processing Division and sponsored by EETimes. Steve starts to talk about the current GPU architecture “Utgard” used in Mali-200, Mali-300 and Mali-400MP which allows resolutions up to 1080p and are used in many smartphones today including Samsung Galaxy S2 (Mali-400MP) which provides great graphics performance. He then explains how mobile devices are used today and what performance we may except in the future: Mobile As Main compute platform: New UI and Augmented Reality Social Networks and emails Content Creation/consumption 1 Device to multiple screen (e.g LCD screen and TV via HDMI) Evolving Processing Demand: Graphics Complexity multiplied by 25 Increase in screen size (1080p resolution support). Graphics API: Khronos OpenGL ES, Microsoft DirectX 11 Compute API: OpenCL, Renderscript Compute and Direct Compute. After this overview, […]

OpenMAX (Open Media Acceleration)

OpenMAX (Open Media Acceleration) is a royalty-free, cross-platform set of C-language programming interfaces that provides abstractions for routines especially useful for audio, video, and still images. OpenMAX standard is managed by the non-profit technology consortium Khronos Group. OpenMAX allows developers to take advantages of hardware media decoding/encoding. For example, If you want to play video using Raspberry Pi hardware (VideoCore IV GPU in Broadcom BCM2835) you’ll have to use OpenMAX IL. OpenMAX provides three layers of interfaces: Application Layer (AL): Open standard for accelerating the capture, and presentation of audio, video, and images in multimedia applications on embedded and mobile devices. Integration Layer (IL) : API defining a standardized media component interface to enable developers and platform providers to integrate and communicate with multimedia codecs implemented in hardware or software. Development Layer (DL): APIs containing a comprehensive set of audio, video and imaging functions that can be implemented and optimized […]

Device Tree Status Report – ELCE 2011

Grant Likely, owner at Secret Lab Technology, describe the current status of device tree (used to resolve ARM “hodgepodge” issue) and provides an example at Embedded Linux Conference Europe 2011. Abstract: In recent years, Linux has enjoyed immense success in the embedded market, and we’ve seen an explosion in the number of devices supported by the mainline Linux kernel. Traditionally, however, adding support for another embedded machine typically involved adding yet another board.c file to the kernel which more often than not was simply cut and paste from a similar board. As a result, board support code contains a huge amount of duplication and has become so huge that it is becoming unmaintainable. To move away from individual board files, several architectures have adopted the Device Tree method of encoding the hardware details into a data structure which can be parsed by generic initialization code and device drivers. This session […]

Developing Embedded Linux Devices Using the Yocto Project – ELCE 2011

Presentation entitled “Developing Embedded Linux Devices Using the Yocto Project and What’s new in 1.1” by David Stewart, Intel, at Embedded Linux Conference Europe 2011. Abstract: The Yocto Project is a joint project to unify the world’s efforts around embedded Linux and to make Linux the best choice for embedded designs. The Yocto Project is an open source starting point for embedded Linux development which contains tools, templates, methods and actual working code to get started with an embedded device project. In addition, the Yocto Project includes Eclipse plug-ins to assist the developer. This talk gives a walk-through of the key parts of the Yocto Project for developing embedded Linux projects. In addition, features are described from the latest release of Yocto (1.1). At the end of the talk, developers should be able to start their own embedded project using the Yocto Project and use it for developing the next […]

Energy Efficiency of ARM Architecture for Cloud Computing Applications

Following “Pandaboard Cloud Cluster Running Google App Engine” post, there were some questions regarding the actual power efficiency of ARM servers vs Intel (Xeon) servers and some commenters questioned the performance of ARM chips. I’ve found a thesis evaluating how the energy efficiency of the ARMv7 architecture based processors Cortex-A9 and Cortex-A8 compares – in applications such as a SIP Proxy and a web server (Apache2) – to Intel Xeon processors. The focus of this thesis is to compare the energy efficiency between the two architectures rather than pure performance where the Xeon largely outperforms ARM processors, although a cluster of ARM servers could be used instead to reach the same processing power. Depending on the application, benchmarks indicate energy efficiency of 3-11 times greater for the ARM Cortex-A9 in comparison to the Intel Xeon. The full thesis (74 pages) is available below. Jean-Luc Aufranc (CNXSoft)Jean-Luc started CNX Software in […]

Xibo Digital Signage on ARM (Full Version)

Last month, I wrote a post showing how to run Xibo Open Source Digital Signage in a BeagleBoard/Overo emulator. That version could communicate with Xibo server, download the required files, display pictures and (maybe) play videos with the real hardware. However, it had serious limitation as Text, RSS and web pages could not be displayed. I’ve now fixed those issues and the full Xibo 1.3.1 can run on ARM platform. First, you need to follow the instructions given in Xibo Digital Signage on ARM (Beagleboard / Overo), although we’ll need to modify something with libavg compilation (see below). Then cross-compile berkelium for ARM using Linaro toolchain. Add libbrowser-node to libavg plugin directory and build libavg again. Also copy the Berkelium header files in to src/test/plugin (i.e. src/test/plugin/berkelium) or add the include file path to CFLAGS/CXXFLAGS.

Create libberkeliumwrapper.so:

Copy the required files to the qemu image: sudo mount -o […]

Run 2 OS Simultaneously on ARM (OMAP4) with Codezero Embedded Hypervisor

B Labs, a company specializing in ARM Virtualization, was at ARM Techcon 2011 showcasing Codezero, their Embedded Hypervisor to run multiple Linux OS such as Android and Chrome OS on ARM processors. The main purpose of running 2 operating systems is to separate home and enterprise operating systems in mobile devices so that enterprise data is safe. Charbax (ARMDevices.net) interviewed Bahadir Baldan, founder of B Labs, and showed a demo running 2 Android instances and another running Android and Linux in pandaboard. The overhead is 10 to 15% according to B Labs, so the performance hit is minimal. They have already managed to run 4 OS on quad core processors with good performance. They are not able to run Windows operating systems (e.g. Windows Mobile 7.5/ Windows 8) yet, because Cortex A9 processors lack virtualization extensions. This will however be feasible with Cortex A15 processors as binary virtualization will be available. […]

Exit mobile version