We’ve previously seen GPU compute on ARM could improve performance for mobile, automotive and consumer electronics application. GPU compute offload CPU task that can be parallelized to the GPU using APIs such as OpenCL or RenderScript. Most applications that can leverage GPU compute are related to media processing (video decoding, picture processing, audio decoding, image reconigion, etc…), but one thing I did not suspect could be improve is database access. That’s what Tom Gall, Linaro, has achieved in a side project by using OpenCL to accelerate SQLite database operations by around 4 times for a given benchmark. The hardware used was a Samsung Chromebook with an Exynos 5250 SoC featurig a dual core Cortex A15 processor and an ARM Mali T604 GPU. CPU compute is only possible on ARM Mali T6xx and greater, and won’t work on Mali 400 / 450 GPUs. Other GPU vendors such as Vivante and Imagination […]
Phill Smith, Demo Manager at ARM, has filmed and uploaded four very interesting demos of what new features will be possible thanks to new generation ARM Mali-450 and Mali-T6xx GPUs including 4K 3D user interfaces and games, ASTC texture compression, and OpenCL accelerated gesture recognition and HEVC / H.265 video decoding. 4K Resolution 3D User Interface and Game Demo The first demo showcases a Geniatech box (ATV1800?) powered by AMLogic AML8726-M8 featuring an ARM Mali-450MP6 GPU running Android with a user 4K 3D interface designed by Autodesk using Scaleform UI. The rest of the video shows Timbuku 3D gaming demo running at 3840×2160 (4K2K) @ 24 fps. The frame rate appears to be low, but that’s because the box is using HDMI 1.4, which limits UHD output to 24fps. 2160p60 is only available via HDMI 2.0. ASTC Compression Demo on Samsung Galaxy Note 3 3D Textures are getting bigger and […]
Since the announcement of ARM Mali-T604 in 2010, ARM has explained that GPGPU (General Purpose computing on GPU), aka GPU Compute, would be one of the key features of their new Mali graphics processor, and the company now expects GPGPU to become mainstream in embedded and mobile devices in 2014 and beyond. I’ve just come across a presentation by Roberto Mijat, technical marketing manager at ARM, entitled “Unleashing the benefits of GPU Computing with ARM Mali” which shows practical applications and use cases where the use of RenderScript, or OpenCL can make massive performance improvements, at much lower power consumption, over the same parallel tasks processed by the CPU only. Let’s have a look at some of the most interesting slides. GPU compute can be used for multiple applications in mobile, multimedia, and automotive sectors. GPU Compute for H.265 / HEVC HEVC aka H.265 is the next generation codec providing […]
For the very first time, ARM showcased on of their latest GPU, the Mali T-604, at SIGGRAPH 2012. There were 3 demos running in a tablet reference platform based on Samsung Exynos 5 Dual Cortex A15 processor clocked at 1.7 GHz: Timbuktu 2 showing improvement brought by OpenGL ES 3.0 such as higher details buffers, shadow comparison, etc… Hauntheim showcasing multiple lightings accelerated with GLES 3.0 and OpenCL (GPU compute) Enlighten, a demo where you can adjust the sun position and see the building shadows move smoothly in real-time.
Imagination Technologies announced the first two IP cores, namely PowerVR G6200 and G6400 GPU IP cores, part of PowerVR Series6 GPU cores. PowerVR Series6 relies on PowerVR Rogue architecture based on a scalable number of compute clusters and designed to target the requirements of new high-end graphics applications for smartphones, tablets, PC, console, automotive, DTV and more. The G6200 and G6400 have 2 and 4 compute clusters respectively. The company claims that PowerVR Series6 GPUs can deliver 20x or more of the performance of current generation GPU cores targeting comparable markets thanks to the new Rogue architecture that is around 5x more efficient than previous generations. The PowerVR Series6 GPU cores offer computing performance exceeding 100GFLOPS (gigaFLOPS) and reaching the TFLOPS (teraFLOPS) range. The PowerVR Series6 family introduces new technologies and features such as: An advanced scalable compute cluster architecture. High efficiency compression technology including lossless image and parameter compression […]
I’ve attended a webinar entitled “Harness the power and flexibility of the Midgard architecture for Embedded GPUs” presented Steve Steele, Product Manager at ARM Media Processing Division and sponsored by EETimes. Steve starts to talk about the current GPU architecture “Utgard” used in Mali-200, Mali-300 and Mali-400MP which allows resolutions up to 1080p and are used in many smartphones today including Samsung Galaxy S2 (Mali-400MP) which provides great graphics performance. He then explains how mobile devices are used today and what performance we may except in the future: Mobile As Main compute platform: New UI and Augmented Reality Social Networks and emails Content Creation/consumption 1 Device to multiple screen (e.g LCD screen and TV via HDMI) Evolving Processing Demand: Graphics Complexity multiplied by 25 Increase in screen size (1080p resolution support). Graphics API: Khronos OpenGL ES, Microsoft DirectX 11 Compute API: OpenCL, Renderscript Compute and Direct Compute. After this overview, […]
OpenCL (Open Computing Language) is a multi-vendor open standard for general-purpose parallel programming of heterogeneous systems that include CPUs, GPUs and other processors. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for highperformance compute servers, desktop computer systems and handheld devices. OpenCL standard is managed and defined by the Khronos Group. The latest version (OpenCL 1.1) was ratified by the Khronos Group on the 14th of June 2010 and adds significant functionality for enhanced parallel programming flexibility, functionality and performance including: Host-thread safety, enabling OpenCL commands to be enqueued from multiple host threads. Sub-buffer objects to distribute regions of a buffer across multiple OpenCL devices. User events to enable enqueued OpenCL commands to wait on external events. Event callbacks that can be used to enqueue new OpenCL commands based on event state changes in a non-blocking manner. 3-component vector data types. Global work-offset which […]
Earlier this May, Ziilabs announced the ZMS-20 dual core processors targeted at Android Honeycomb tablets. They have now unveil the 3rd generation of Android Tablet reference designs based on ZMS-20 and ZMS-40 called Jaguar3. Here’s an excerpt of the press release: ZiiLABS, a pioneering media processor and platforms company (a wholly-owned subsidiary of Creative Technology Ltd), today introduced its JAGUAR3, the most powerful 3rd Generation Android 3.2 tablet series. JAGUAR3 is a series of ultra-slim, ultra-lightweight and stylish 10.1” tablet reference designs targeted at the OEM markets. With over a decade of designing experience in portable mobile devices, Creative provided the ergonomic and sexy design of this series of JAGUAR3 tablets. ZMS-20 StemCell Processors JAGUAR3’s superior performance, low power consumption and rich feature set comes from the dual-core 1.5GHz ARM Cortex-A9 based ZMS-20 StemCell Processors. ZMS-20 has another 48 StemCell Processing cores within, which effectively make it into a 50-core processor. […]