Archive

Posts Tagged ‘mali’

Allwinner SoCs with Mali GPU Get Mainline Linux OpenGL ES Support

September 26th, 2017 21 comments

OpenGL ES support in Linux for ARM SoC is usually pretty hard to get because of closed source binary blobs coupled with the manufacturers focus on Android. Workarounds include open driver projects such as Freedreno for Qualcomm Adreno GPU, Nouveau for Tegra, or Etnaviv for Vivante GPUs, as well as libhybris library that converts Linux calls into Android calls in order to leverage existing Android GPU binary blobs. Allwinner processors relies on either PoverVR or ARM Mali GPU, and the former does not have any open source project, while some work is still being going for the latter with Lima project, but it’s not ready yet.

That means so far, you’re only option was to use libhybris for either GPU family. The good news is that Free Electrons engineers have been working on OpenGL ES support for ARM Mali GPU for Allwinner processor, and have been allowed to release the userspace binary blobs. Not quite as exciting as an actual open source release, but at least, we should now be able to use OpenGL ES with mainline Linux on most Allwinner SoCs (the ones not using PowerVR GPUs).

If you want to try it on your platform, you’ll first need to add ARM Mali GPU device tree definitions to your platform’s DTS file if it is not already there, before building the open source Mali kernel module for your board:

This will install mali.ko module to your rootfs. The final step is to get the userspace drivers, either fbdev or X11-dma-buf depending on your setup, for example:

That should be all for the installation, and you should be able to test OpenGL ES using es2_gears or glmark2-es2 programs. Based on the github patchsets, this should currently work for Linux 4.6 to 4.14.

Update: On a separate note, somebody has recently released ffmpeg 3.3.4 with open source Cedrus driver for Allwinner video processing unit, and tested with Allwinner R40 and A64 SoC. Code and package can be found in github.

Open Source ARM Compute Library Released with NEON and OpenCL Accelerated Functions for Computer Vision, Machine Learning

April 3rd, 2017 12 comments

GPU compute promises to deliver much better performance compared to CPU compute for application such a computer vision and machine learning, but the problem is that many developers may not have the right skills or time to leverage APIs such as OpenCL. So ARM decided to write their own ARM Compute library and has now released it under an MIT license.

The functions found in the library include:

  • Basic arithmetic, mathematical, and binary operator functions
  • Color manipulation (conversion, channel extraction, and more)
  • Convolution filters (Sobel, Gaussian, and more)
  • Canny Edge, Harris corners, optical flow, and more
  • Pyramids (such as Laplacians)
  • HOG (Histogram of Oriented Gradients)
  • SVM (Support Vector Machines)
  • H/SGEMM (Half and Single precision General Matrix Multiply)
  • Convolutional Neural Networks building blocks (Activation, Convolution, Fully connected, Locally connected, Normalization, Pooling, Soft-max)

The library works on Linux, Android or bare metal on armv7a (32bit) or arm64-v8a (64bit) architecture, and makes use of  NEON, OpenCL, or  NEON + OpenCL. You’ll need an OpenCL capable GPU, so all Mali-4xx GPUs won’t be fully supported, and you need an SoC with Mali-T6xx, T-7xx, T-8xx, or G71 GPU to make use of the library, except for NEON only functions.

In order to showcase their new library, ARM compared its performance to OpenCV library on Huawei Mate 9 smartphone with HiSilicon Kirin 960 processor with an ARM Mali G71MP8  GPU.

ARM Compute Library vs OpenCV, single-threaded, CPU (NEON)

Even with some NEON acceleration in OpenCV, Convolutions and SGEMM functions are around 15 times faster with the ARM Compute library. Note that ARM selected a hardware platform with one of their best GPU, so while it should still be faster on other OpenCL capable ARM GPUs the difference will be lower, but should still be significantly, i.e. several times faster.

ARM Compute Library vs OpenCV, single-threaded, CPU (NEON)

The performance boost in other function is not quite as impressive, but the compute library is still 2x to 4x faster than OpenCV.

While the open source release was just about three weeks ago, the ARM Compute library has already been utilized by several embedded, consumer and mobile silicon vendors and OEMs better it was open sourced, for applications such as 360-degree camera panoramic stitching, computational camera, virtual and augmented reality, segmentation of images, feature detection and extraction, image processing, tracking, stereo and depth calculation, and several machine learning based algorithms.

Mainline Linux on 64-bit ARM Amlogic SoCs, and TV Boxes such as Wetek Hub / Player 2, NEXBOX A1 / A95X, etc…

March 6th, 2017 30 comments

We’ve already seen Neil Armstrong, part of BayLibre, worked on adding Amlogic SoC (S905/S905X/S912) to mainline Linux via our virtual schedule for the Embedded Linux Conference & OpenIoT Summit 2017. But at the time, although we could see some activity in Linux 4.10 including support for Nexbox A95X and Nexbox A1, they did provide that much details the work that had been done, but since then, ELC 2017 videos have been released, and BayLibre wrote a short post about 3D Graphics support in mainline Linux.

Click to Enlarge

We can see that I/Os, USB host, composite video output, Ethernet, eMMC/SDIO, and PSCI and SCPI features have already been added to Linux 4.10. but some important features have not yet including HDMI, Mali support, Audio, and high speed eMMC modes. HDMI is actually planned for Linux 4.12, which could be released in about 18 weeks if we keep the 10 weeks kernel release schedule we had in the past. WeTek Hub and Play 2 devices tree files have been submitted for Linux 4.11. Beside TV boxes, development boards such as ODROID-C2 and Khadas Vim will also be supported and benefit from this work.

While Mali is not supported in mainline Linux yet, the patchsets for Mali-450 GPU are available on github in order to enable 3D graphics acceleration in Amlogic S905, S905X and S905X. If you are interested to get more details, you may want to watch Neil Armstrong presentation at ELC 2017 which explains the status of Amlogic Linux before working on mainline, the work achieved, the work in progress, and an overview of the community.

You may also want to download the presentation slides for an overview of the talk, and check out linux-meson.com and BayLibre blog for future updates.

Thanks to Space Invader, theguyuk, and Harley for the tips.

Samsung Launches Exynos 9 Series 8895 SoC with Custom ARMv8 Cores, Mali-G71 GPU, Gigabit LTE Modem, 10nm FinFET Process

February 23rd, 2017 No comments

Samsung Electronics has just announced the launch of its latest Exynos application processor (AP), with Exynos 9 Series 8895 octa-core processor with four second generation custom designed ARMv8  CPU cores, and four Cortex A53 cores, as well as a Mali-G71 3D GPU, and a Gigabit LTE modem.

The LTE modem delivers data throughput at up to 1Gbps (Cat.16) downlink with 5CA (five carrier aggregation), and 150Mbps (Cat.13) uplink with 2CA. The SoC also embeds an “advanced MFC” (multi-format codec) for recording and playback at up to 4K UHD at 120 fps, a Vision Processing Unit for video tracking, image process, and machine vision technology, and another processing unit allows for mobile payments using iris or fingerprint recognition.

Exynos 8895 is also the first application processor manufactured with 10-nanometer (nm) FinFET process technology and improved 3D transistor structure, which according to Samsung, allows for up to 27% higher performance, while consuming 40% less power when compared to 14nm technology.

Samsung Exynos 9 Series 8895 is currently in mass production, and could be found in the next Galaxy S8 smartphone.

HiSilicon Kirin 960 Octa Core Application Processor Features ARM Cortex A73 & A53 Cores, Mali G71 MP8 GPU

October 20th, 2016 2 comments

Following on Kirin 950 processor found in Huawei Mate 8, P9, P9 Max & Honor 8 smartphones, Hisilicon has now unveiled Kirin 960 octa-core processor with four ARM Cortex A73 cores, four Cortex A53 low power cores, a Mali G71 MP8 GPU, and an LTE Cat.12 modem.

kirin-960-block-diagram

The table below from Anandtech compares features and specifications of Kirin 950 against the new Kirin 960 processor.

SoC Kirin 950 Kirin 960
CPU 4x Cortex A72 (2.3 GHz)
4x Cortex A53 (1.8 GHz)
4x Cortex A73 (2.4 GHz)
4x Cortex A53 (1.8 GHz)
Memory
Controller
LPDDR3-933
or LPDDR4-1333
(hybrid controller)
LPDDR4-1800
GPU ARM Mali-T880MP4
@ 900 MHz
ARM Mali-G71MP8
@ 900 MHz
Interconnect ARM CCI-400 ARM CCI-550
Encode/
Decode
1080p H.264
Decode & Encode2160p30 HEVC
Decode
2160p30 HEVC & H.264
Decode & Encode2160p60 HEVC
Decode
Camera/ISP Dual 14bit ISP
940MP/s
Improved
Dual 14bit ISP
Sensor Hub i5 i6
Storage eMMC 5.0 UFS 2.1
Integrated
Modem
Balong Integrated
UE Cat. 6 LTE
Integrated
UE Cat. 12 LTE
4x CA
4×4 MIMO

ARM claims 30% “sustained” performance improvement between Cortex A72 and Cortex A73,  but the GPU should be where the performance jump is more significant, as ARM promises a 50 percent increase in graphics performance, and a 20 percent improvement in power efficiency with Mali G71 compared the previous generation (Mali-T880). Kirin 960 also integrates twice the GPU cores compared to Kirin 950, and some GPU benchmarks provided by Hisilicon/Huawei confirm the theory with over 100% performance improvement in both Manhattan 1080p offscreen and T-Rex offscreen GFXBench 4.0 benchmarks.

kirin960-gpu-performance
The first smartphone to feature Kirin 960 is likely to be Huawei Mate 9 rumored to come with a 5.9″ 2K display, 6GB RAM, and 256 UFS flash.

Open Source Mali-200 / Mali-400 GPU Lima Driver Gets New Commits

April 3rd, 2016 6 comments

The Lima driver, a project aimed at providing an open source driver for ARM Mali-400 and Mali-200 GPUs, was introduced 4 years ago, and after some reverse engineering work, a Quake 3 demo was showcase later in 2013 with an intermediate version of the Lima drivers. However, the main developer (libv) eventually lost interest or lacked time to further work, and the latest commit was made in June 9, 2013. But another developer (oklas) committed some code to limadriver-ng just a few days ago.

Lima_Driver_Pull_RequestBut don’t get too excited, as the modifications are minor with some build fixes, some other Makefile modifications, and only one C file modified with 6 new lines of code. But maybe that’s just the beginning… We’ll see.

Mali-400 GPU is now rather old, so why would somebody work on this? One explanation could be C.H.I.P and Pine A64 boards are both based on Allwinner SoCs with a Mali-400 GPU, but a more likely explanation is that libv invited new developers on limadriver.org:

2015-12-20: this project looking for developers, if you’d like to try, come to our IRC #lima :)

So we’ll have to see how this all turns out, and if somebody is indeed motivated on working on the port. If so, C.H.I.P and Pine A64 boards, as well as other Mali-400 based platforms, could get open source GPU drivers.

Thanks to Luka via Reddit, where you can find some more details about the timeline.

ARM Releases Kernel Drivers for Mali-T880 / T860 GPUs, User Space Drivers for Mali-T76x GPUs

February 23rd, 2015 17 comments

ARM Mali GPU drivers includes both open source kernel drivers, and binary userspace drivers supporting framebuffer and/ior X11 implementation. The former is rarely an issue and is quickly released, but the latter requires porting and testing for a specific hardware platform, as well legal work, which greatly delay the releases.

ARM_Mali_GPU_Drivers

Release r5p0-06rel0 for User Space Binary Drivers

Mali-T880 GPU was announced at the beginning of the month together with ARM Cortex A72, and on February 17, 2015, ARM released an update to their Mali-T600 series, Mali-T700 series & Mali-T860/T880 GPU kernel device drivers with revision r5p1-00rel0 that adds supports to Mali-T860 and Mali-T880 GPU. These open source drivers are available for Android and Linux, and also support early Mali-T700 and T600 GPUs.

Separately, the company has also released Mali-T76X GPU drivers for Firefly board powered by Rockchip RK3288 quad core Cortex A17 processor featuring a Mali-T764 GPU. The first release only supports the framebuffer driver, but ARM is expecting to be able to release the X11 version in the next release (r5p1) planned at the end of March, which means some Linux desktop graphics accelerated will soon be available on Rockchip RK3288, and not only some OpenGL ES 3.0 demos on the framebuffer. The latest release (r5p0-06rel0) also supports Exynos powered Arndale Octa board, Samsung Chromebook 2, Arndale board, and Samsung Chromebook. According to an ARM representative, Rockchip also plans to release their own Linux GPU drivers targeting “TopMetal” hardware platform (should probably read PopMetal).

TyGL OpenGL ES 2.0 Backend for WebKit Speeds Up Web Rendering by Up to 11 Times

December 23rd, 2014 3 comments

ARM, Szeged University in Hungary, and Samsung Research UK have been working on TyGL, a new backend for WebKit accelerated with OpenGL ES2.0, and developed and tested on ARM Mali-T628 GPU found in Samsung ARM Chromebook. It will typically provide 1.5 to 4.5 times higher performance, but in the best cases, it can achieve up to eleven times the performance of a CPU-only rendered page.

TyGL_ScreenshotThe key features of TyGL include:

  • Web rendering accelerated by GPU Batching of draw calls delivers better results on GPUs. TyGL groups commands together to avoid frequent state changes while calling the Graphics Context API.
  • Automatic shader generationTyGL generates complex shaders from multiple shader fragments, and ensures the batches fit into the shader cache of the GPU.
  • Trapezoid based path rendering – Work in progress. It will leverage GPU capabilities such as the Pixel Local Storage extension for OpenGL ES.
  • No software fallback – Complete GPU-based hardware accelerated solution with no dependency on legacy software.

You can get more technical details about the implementation on TyGL: Hardware Accelerated Web Rendering blog post on ARM community.

They have now officially published benchmark results, but I found some benchmark results on Webkit mailing list:

Since EFL supports cairo, we compared EFL-TyGL and EFL-Cairo

The other good news is that TyGL is now open source, with the code available on github, and you can build it and give it a try on ARM Mali-T62X development boards such as Arndale Octa or ODROID-XU3 (Lite) running Ubuntu Linaro 14.04, or other Linux based distributions. The complete build is said to last about 10 hours, but this will obviously depend on your machine. TyGL should also work on other mobile GPU supporting OpenGL ES 2.0, but I understand this has not been tested yet.