Posts Tagged ‘mali’

Open Source ARM Compute Library Released with NEON and OpenCL Accelerated Functions for Computer Vision, Machine Learning

April 3rd, 2017 11 comments

GPU compute promises to deliver much better performance compared to CPU compute for application such a computer vision and machine learning, but the problem is that many developers may not have the right skills or time to leverage APIs such as OpenCL. So ARM decided to write their own ARM Compute library and has now released it under an MIT license.

The functions found in the library include:

  • Basic arithmetic, mathematical, and binary operator functions
  • Color manipulation (conversion, channel extraction, and more)
  • Convolution filters (Sobel, Gaussian, and more)
  • Canny Edge, Harris corners, optical flow, and more
  • Pyramids (such as Laplacians)
  • HOG (Histogram of Oriented Gradients)
  • SVM (Support Vector Machines)
  • H/SGEMM (Half and Single precision General Matrix Multiply)
  • Convolutional Neural Networks building blocks (Activation, Convolution, Fully connected, Locally connected, Normalization, Pooling, Soft-max)

The library works on Linux, Android or bare metal on armv7a (32bit) or arm64-v8a (64bit) architecture, and makes use of  NEON, OpenCL, or  NEON + OpenCL. You’ll need an OpenCL capable GPU, so all Mali-4xx GPUs won’t be fully supported, and you need an SoC with Mali-T6xx, T-7xx, T-8xx, or G71 GPU to make use of the library, except for NEON only functions.

In order to showcase their new library, ARM compared its performance to OpenCV library on Huawei Mate 9 smartphone with HiSilicon Kirin 960 processor with an ARM Mali G71MP8  GPU.

ARM Compute Library vs OpenCV, single-threaded, CPU (NEON)

Even with some NEON acceleration in OpenCV, Convolutions and SGEMM functions are around 15 times faster with the ARM Compute library. Note that ARM selected a hardware platform with one of their best GPU, so while it should still be faster on other OpenCL capable ARM GPUs the difference will be lower, but should still be significantly, i.e. several times faster.

ARM Compute Library vs OpenCV, single-threaded, CPU (NEON)

The performance boost in other function is not quite as impressive, but the compute library is still 2x to 4x faster than OpenCV.

While the open source release was just about three weeks ago, the ARM Compute library has already been utilized by several embedded, consumer and mobile silicon vendors and OEMs better it was open sourced, for applications such as 360-degree camera panoramic stitching, computational camera, virtual and augmented reality, segmentation of images, feature detection and extraction, image processing, tracking, stereo and depth calculation, and several machine learning based algorithms.

Mainline Linux on 64-bit ARM Amlogic SoCs, and TV Boxes such as Wetek Hub / Player 2, NEXBOX A1 / A95X, etc…

March 6th, 2017 30 comments

We’ve already seen Neil Armstrong, part of BayLibre, worked on adding Amlogic SoC (S905/S905X/S912) to mainline Linux via our virtual schedule for the Embedded Linux Conference & OpenIoT Summit 2017. But at the time, although we could see some activity in Linux 4.10 including support for Nexbox A95X and Nexbox A1, they did provide that much details the work that had been done, but since then, ELC 2017 videos have been released, and BayLibre wrote a short post about 3D Graphics support in mainline Linux.

Click to Enlarge

We can see that I/Os, USB host, composite video output, Ethernet, eMMC/SDIO, and PSCI and SCPI features have already been added to Linux 4.10. but some important features have not yet including HDMI, Mali support, Audio, and high speed eMMC modes. HDMI is actually planned for Linux 4.12, which could be released in about 18 weeks if we keep the 10 weeks kernel release schedule we had in the past. WeTek Hub and Play 2 devices tree files have been submitted for Linux 4.11. Beside TV boxes, development boards such as ODROID-C2 and Khadas Vim will also be supported and benefit from this work.

While Mali is not supported in mainline Linux yet, the patchsets for Mali-450 GPU are available on github in order to enable 3D graphics acceleration in Amlogic S905, S905X and S905X. If you are interested to get more details, you may want to watch Neil Armstrong presentation at ELC 2017 which explains the status of Amlogic Linux before working on mainline, the work achieved, the work in progress, and an overview of the community.

You may also want to download the presentation slides for an overview of the talk, and check out and BayLibre blog for future updates.

Thanks to Space Invader, theguyuk, and Harley for the tips.

Samsung Launches Exynos 9 Series 8895 SoC with Custom ARMv8 Cores, Mali-G71 GPU, Gigabit LTE Modem, 10nm FinFET Process

February 23rd, 2017 No comments

Samsung Electronics has just announced the launch of its latest Exynos application processor (AP), with Exynos 9 Series 8895 octa-core processor with four second generation custom designed ARMv8  CPU cores, and four Cortex A53 cores, as well as a Mali-G71 3D GPU, and a Gigabit LTE modem.

The LTE modem delivers data throughput at up to 1Gbps (Cat.16) downlink with 5CA (five carrier aggregation), and 150Mbps (Cat.13) uplink with 2CA. The SoC also embeds an “advanced MFC” (multi-format codec) for recording and playback at up to 4K UHD at 120 fps, a Vision Processing Unit for video tracking, image process, and machine vision technology, and another processing unit allows for mobile payments using iris or fingerprint recognition.

Exynos 8895 is also the first application processor manufactured with 10-nanometer (nm) FinFET process technology and improved 3D transistor structure, which according to Samsung, allows for up to 27% higher performance, while consuming 40% less power when compared to 14nm technology.

Samsung Exynos 9 Series 8895 is currently in mass production, and could be found in the next Galaxy S8 smartphone.

HiSilicon Kirin 960 Octa Core Application Processor Features ARM Cortex A73 & A53 Cores, Mali G71 MP8 GPU

October 20th, 2016 2 comments

Following on Kirin 950 processor found in Huawei Mate 8, P9, P9 Max & Honor 8 smartphones, Hisilicon has now unveiled Kirin 960 octa-core processor with four ARM Cortex A73 cores, four Cortex A53 low power cores, a Mali G71 MP8 GPU, and an LTE Cat.12 modem.


The table below from Anandtech compares features and specifications of Kirin 950 against the new Kirin 960 processor.

SoC Kirin 950 Kirin 960
CPU 4x Cortex A72 (2.3 GHz)
4x Cortex A53 (1.8 GHz)
4x Cortex A73 (2.4 GHz)
4x Cortex A53 (1.8 GHz)
or LPDDR4-1333
(hybrid controller)
GPU ARM Mali-T880MP4
@ 900 MHz
ARM Mali-G71MP8
@ 900 MHz
Interconnect ARM CCI-400 ARM CCI-550
1080p H.264
Decode & Encode2160p30 HEVC
2160p30 HEVC & H.264
Decode & Encode2160p60 HEVC
Camera/ISP Dual 14bit ISP
Dual 14bit ISP
Sensor Hub i5 i6
Storage eMMC 5.0 UFS 2.1
Balong Integrated
UE Cat. 6 LTE
UE Cat. 12 LTE
4x CA
4×4 MIMO

ARM claims 30% “sustained” performance improvement between Cortex A72 and Cortex A73,  but the GPU should be where the performance jump is more significant, as ARM promises a 50 percent increase in graphics performance, and a 20 percent improvement in power efficiency with Mali G71 compared the previous generation (Mali-T880). Kirin 960 also integrates twice the GPU cores compared to Kirin 950, and some GPU benchmarks provided by Hisilicon/Huawei confirm the theory with over 100% performance improvement in both Manhattan 1080p offscreen and T-Rex offscreen GFXBench 4.0 benchmarks.

The first smartphone to feature Kirin 960 is likely to be Huawei Mate 9 rumored to come with a 5.9″ 2K display, 6GB RAM, and 256 UFS flash.

Open Source Mali-200 / Mali-400 GPU Lima Driver Gets New Commits

April 3rd, 2016 6 comments

The Lima driver, a project aimed at providing an open source driver for ARM Mali-400 and Mali-200 GPUs, was introduced 4 years ago, and after some reverse engineering work, a Quake 3 demo was showcase later in 2013 with an intermediate version of the Lima drivers. However, the main developer (libv) eventually lost interest or lacked time to further work, and the latest commit was made in June 9, 2013. But another developer (oklas) committed some code to limadriver-ng just a few days ago.

Lima_Driver_Pull_RequestBut don’t get too excited, as the modifications are minor with some build fixes, some other Makefile modifications, and only one C file modified with 6 new lines of code. But maybe that’s just the beginning… We’ll see.

Mali-400 GPU is now rather old, so why would somebody work on this? One explanation could be C.H.I.P and Pine A64 boards are both based on Allwinner SoCs with a Mali-400 GPU, but a more likely explanation is that libv invited new developers on

2015-12-20: this project looking for developers, if you’d like to try, come to our IRC #lima :)

So we’ll have to see how this all turns out, and if somebody is indeed motivated on working on the port. If so, C.H.I.P and Pine A64 boards, as well as other Mali-400 based platforms, could get open source GPU drivers.

Thanks to Luka via Reddit, where you can find some more details about the timeline.

ARM Releases Kernel Drivers for Mali-T880 / T860 GPUs, User Space Drivers for Mali-T76x GPUs

February 23rd, 2015 17 comments

ARM Mali GPU drivers includes both open source kernel drivers, and binary userspace drivers supporting framebuffer and/ior X11 implementation. The former is rarely an issue and is quickly released, but the latter requires porting and testing for a specific hardware platform, as well legal work, which greatly delay the releases.


Release r5p0-06rel0 for User Space Binary Drivers

Mali-T880 GPU was announced at the beginning of the month together with ARM Cortex A72, and on February 17, 2015, ARM released an update to their Mali-T600 series, Mali-T700 series & Mali-T860/T880 GPU kernel device drivers with revision r5p1-00rel0 that adds supports to Mali-T860 and Mali-T880 GPU. These open source drivers are available for Android and Linux, and also support early Mali-T700 and T600 GPUs.

Separately, the company has also released Mali-T76X GPU drivers for Firefly board powered by Rockchip RK3288 quad core Cortex A17 processor featuring a Mali-T764 GPU. The first release only supports the framebuffer driver, but ARM is expecting to be able to release the X11 version in the next release (r5p1) planned at the end of March, which means some Linux desktop graphics accelerated will soon be available on Rockchip RK3288, and not only some OpenGL ES 3.0 demos on the framebuffer. The latest release (r5p0-06rel0) also supports Exynos powered Arndale Octa board, Samsung Chromebook 2, Arndale board, and Samsung Chromebook. According to an ARM representative, Rockchip also plans to release their own Linux GPU drivers targeting “TopMetal” hardware platform (should probably read PopMetal).

TyGL OpenGL ES 2.0 Backend for WebKit Speeds Up Web Rendering by Up to 11 Times

December 23rd, 2014 3 comments

ARM, Szeged University in Hungary, and Samsung Research UK have been working on TyGL, a new backend for WebKit accelerated with OpenGL ES2.0, and developed and tested on ARM Mali-T628 GPU found in Samsung ARM Chromebook. It will typically provide 1.5 to 4.5 times higher performance, but in the best cases, it can achieve up to eleven times the performance of a CPU-only rendered page.

TyGL_ScreenshotThe key features of TyGL include:

  • Web rendering accelerated by GPU Batching of draw calls delivers better results on GPUs. TyGL groups commands together to avoid frequent state changes while calling the Graphics Context API.
  • Automatic shader generationTyGL generates complex shaders from multiple shader fragments, and ensures the batches fit into the shader cache of the GPU.
  • Trapezoid based path rendering – Work in progress. It will leverage GPU capabilities such as the Pixel Local Storage extension for OpenGL ES.
  • No software fallback – Complete GPU-based hardware accelerated solution with no dependency on legacy software.

You can get more technical details about the implementation on TyGL: Hardware Accelerated Web Rendering blog post on ARM community.

They have now officially published benchmark results, but I found some benchmark results on Webkit mailing list:

Since EFL supports cairo, we compared EFL-TyGL and EFL-Cairo

The other good news is that TyGL is now open source, with the code available on github, and you can build it and give it a try on ARM Mali-T62X development boards such as Arndale Octa or ODROID-XU3 (Lite) running Ubuntu Linaro 14.04, or other Linux based distributions. The complete build is said to last about 10 hours, but this will obviously depend on your machine. TyGL should also work on other mobile GPU supporting OpenGL ES 2.0, but I understand this has not been tested yet.

ARM Unveils Mali-T800 Series GPUs, Mali-V550 VPU, and Mali-DP550 Display Processor

October 28th, 2014 3 comments

ARM has just announced several new Mali media IP: three Mali-T800 series GPUs (Mali-T820, Mali-T830, and Mali-T830) based on Midgard architecture, as well as Mali-V500 video accelerator, and the Mali-DP550 display processor.


Mali T800 Series GPU

The new Mali T-8xx GPUs are based on the same Midgard architecture used in Mali T-6xx and T-7xx GPUs, but deliver better power efficiency thanks to technologies such as ARM Frame Buffer Compression (AFBC), and Adaptive Scalable Texture Compression (ASTC) for imput bandwidth reduction, as well as Transaction Elimination and Smart Composition.

ARM provided some performance and energy-comparison between T800 and T600 series (but strangely nothing against T700):

  • The Mali-T820 GPU is optimized for entry-level products, achieving up to 40 percent more performance density compared to the Mali-T622 GPU.
  • The Mali-T830 GPU delivers up to 55 percent more performance than the Mali-T622 GPU.
  • The Mali-T860 GPU provides higher performance and 45 percent more energy-efficiency compared to the Mali-T628 GPU.


Mali-T860 supports up to 16 shader cores whereas Mali-T820 and Mali T-830 are limited to 4 shader cores. Supported APIs include OpenGL ES 3.1/3.0/2.0/1.1, DirectX 11, OpenCL 1.2/1.1, and RenderScript. Mali-T860 also provides 10-bit YUV input and output at full speed, which could be especially useful for 4K video using HEVC codec.

More details can be found on Mali-T860, Mali-T830 and Mali-T820 product pages.

Mali-V550 Video Processing Unit

Mali-V550 video processor fully supports the HEVC standard, and the single core version can decode/encode 1080p60 HEVC video, whereas the eight core version can handle 4K @ 120 Hz HEVC decoding/encoding.

Mali-V550 also benefits from new features such as Motion Search Elimination technology that reduces bandwidth by up to 35 percent, and will improve Wi-Fi Display/Miracast user experience. Up to 50% bandwidth reduction can also be achieve with AFBC. It also supports 10-bit YUV, so 10-bit HEVC/H.265 video be supported combined with Mali-T800 GPU, with the VPU “feeding” 10-bit decoded data to the GPU.  Other video codecs include the usual suspects, namely H.264, MPEG4, MPEG2, VP8, VC1, Real Media, H.263, MPEG-4 and JPEG. VP9 support is not mentioned. Driver and video streaming infrastructure is based on OpenMAX.

Visit Mali-V550 product page for more information.

Mali-DP550 Display Processor

Mali-DP550 display process will handle composition, scaling, rotation and image post-processing from the GPU in a single pass, and it also support Motion Search Elimination, and AFBC to reduce bandwidth use in order to maximize battery life. Up to seven layers of composition, up to 4K resolution, are supported, a co-processor interface enabled easy integration with third party IP blocks.

Mali-DP550Single and dual display output are supported, as well as various YUV/RGB pixel formats, including 10-bit YUV. More details can be found on ARM’s Mali-DP550 page.

All three new ARM Mali media IPs are available for immediate licensing, and consumer devices are expected in late 2015 and early 2016.

Via Anandtech.

Categories: Graphics Tags: arm, gpu, h.265, hevc, mali, mali-t860, vp8