Posts Tagged ‘gpu’

This Video Shows Vulkan API’s Higher Power Efficiency Compared to OpenGL ES API on ARM SoCs

October 20th, 2016 1 comment

Vulkan was introduced as the successor of OpenGL ES in March 2015, promising to take less CPU resources, and support multiple command buffers that can be created in parallel and distributed over several cores, at the cost of slightly more complex application programming since less software work in done inside the GPU drivers themselves with app developers needing to handle memory allocation and thread management.

opengl-es-vs-vulkanThis was just a standard at the time, so it still needed some time to implement Vulkan, and work is still in program but ARM showcased the power efficiency of Vulkan over OpenGL ES in the video embedded at the end of this post.

The demo has the same graphics details and performance using both OpenGL ES and Vulkan, but since the load on the CPU in that demo can be distributed over several CPU cores with Vulkan against a single core for OpenGL ES, it’s possible to use low power cores (e.g. Cortex A53) operating at a lower frequency and voltage, hence reducing power consumption.

ARM also measured that the complete OpenGL ES demo would use 1270 joules against 1123 Joules for the Vulkan demo, resulting in about 15% energy savings in this “early stage” demo.

Categories: Android, Video Tags: gpu, opengl, power, vulkan

HiSilicon Kirin 960 Octa Core Application Processor Features ARM Cortex A73 & A53 Cores, Mali G71 MP8 GPU

October 20th, 2016 2 comments

Following on Kirin 950 processor found in Huawei Mate 8, P9, P9 Max & Honor 8 smartphones, Hisilicon has now unveiled Kirin 960 octa-core processor with four ARM Cortex A73 cores, four Cortex A53 low power cores, a Mali G71 MP8 GPU, and an LTE Cat.12 modem.


The table below from Anandtech compares features and specifications of Kirin 950 against the new Kirin 960 processor.

SoC Kirin 950 Kirin 960
CPU 4x Cortex A72 (2.3 GHz)
4x Cortex A53 (1.8 GHz)
4x Cortex A73 (2.4 GHz)
4x Cortex A53 (1.8 GHz)
or LPDDR4-1333
(hybrid controller)
GPU ARM Mali-T880MP4
@ 900 MHz
ARM Mali-G71MP8
@ 900 MHz
Interconnect ARM CCI-400 ARM CCI-550
1080p H.264
Decode & Encode2160p30 HEVC
2160p30 HEVC & H.264
Decode & Encode2160p60 HEVC
Camera/ISP Dual 14bit ISP
Dual 14bit ISP
Sensor Hub i5 i6
Storage eMMC 5.0 UFS 2.1
Balong Integrated
UE Cat. 6 LTE
UE Cat. 12 LTE
4x CA
4×4 MIMO

ARM claims 30% “sustained” performance improvement between Cortex A72 and Cortex A73,  but the GPU should be where the performance jump is more significant, as ARM promises a 50 percent increase in graphics performance, and a 20 percent improvement in power efficiency with Mali G71 compared the previous generation (Mali-T880). Kirin 960 also integrates twice the GPU cores compared to Kirin 950, and some GPU benchmarks provided by Hisilicon/Huawei confirm the theory with over 100% performance improvement in both Manhattan 1080p offscreen and T-Rex offscreen GFXBench 4.0 benchmarks.

The first smartphone to feature Kirin 960 is likely to be Huawei Mate 9 rumored to come with a 5.9″ 2K display, 6GB RAM, and 256 UFS flash.

Imagination Technologies Announces MIPS Warrior I-class I6500 Heterogeneous CPU with up to 384 Cores

October 13th, 2016 No comments

Imagination has just unveiled the successor of MIPS I6400 64-Bit Warrior Core with MIPS Warrior I-class I6500 heterogeneous CPU supporting up to 64 cluster, with up to 6 cores each (384 cores max), themselves up to 4 thread (1536 max), combining with IOCU (IO coherence units), and external IP such as PowerVR GPU or other hardware accelerators.

mips-i6500-scalable-computeThe main features of MIPS I6400 processor are listed as follows:


  • Heterogeneous Inside – In a single cluster, designers can optimize power consumption with the ability to configure each CPU with different combinations of threads, different cache sizes, different frequencies, and even different voltage levels.
  • Heterogeneous Outside – The latest MIPS Coherence Manager with an AMBA ACE interface to popular ACE coherent fabric solutions such as those from Arteris and Netspeed lets designers mix on a chip configurations of processing clusters – including PowerVR GPUs or other accelerators – for high system efficiency.
  • Simultaneous Multi-threading (SMT) – Based on a superscalar dual issue design implemented across generations of MIPS CPUs, this  feature enables execution of multiple instructions from multiple threads every clock cycle, providing higher utilization and CPU efficiency.
  • Hardware virtualization (VZ) – I6500 builds on the real time hardware virtualization capability pioneered in the MIPS I6400 core. Designers can save costs by safely and securely consolidating multiple CPU cores with a single core, save power where multiple cores are required, and dynamically and deterministically allocate CPU bandwidth per application.
  • SMT + VZ – The combination of SMT with VZ in the I6500 offers “zero context switching” for applications requiring real-time response. This feature, alongside the provision of scratchpad memory, makes the I6500 ideal for applications which require deterministic code execution.
  • Designed for compute intensive, data processing and networking applications – The I6500 is designed for high-performance/high-efficiency data transfers to localized compute resources with data scratchpad memories per CPU, and features for fast path message/data passing between threads and cores.
  • OmniShield-ready – Imagination’s multi-domain security technology used across its processing families enables isolation of applications in trusted environments, providing a foundation for security by separation.

The processor is also based on the standard MIPS ISA, so developer will be able to leverage existing software and tools such as compilers, debuggers, operating systems, hypervisors and application software already optimized for the MIPS ISA.



The figure above shows what an SoC based on MIPS I6500 may look like with one cluster with 4 CPU cores, 2 IOCUs, another cluster with any CPU cores but instead eight IOCUs interlinked with third party accelerators, and one PowerVR GPU.

Target applications include advanced driver assistance systems (ADAS), autonomous vehicles, networking, drones, industrial automation, security, video analytics, machine learning, and more. One of the first customer for the new processor is Mobileye EyeQ5 SoC designed for  Fully Autonomous Driving (interestingly shortened as “FAD”) vehicles will eight multi-threaded MIPS CPU cores coupled with eighteen cores of Mobileye’s Vision Processors (VPs). EyeQ5 SoC should be found in vehicles as early as 2021.

MIPS I6500 CPU can be licensed now, with general availability planned for Q1 2017.You’ll find more technical details on the product page, and blog post for the announcement.

Nvidia Provides More Details About Parker Automotive SoC with ARMv8 Cores, Pascal GPU

August 23rd, 2016 9 comments

Nvidia demonstrated DRIVE PX2 platform for self-driving cars at CES 2016, but did not give many details about the SoC used in the board. Today, the company has finally provided more information about Parker hexa-core SoC combining two Denver 2 cores, and four Cortex A57 cores combining with a 256-core Pascal GPU.

Nvidia_Parker_Block_DiagramNvidia Parker SoC specifications:

  • CPU – 2x Denver 2 ARMv8 cores, and 4x ARM Cortex A57 cores with 2MB + 2 MB L2 cache, coherent HMP architecture (meaning all 6 cores can work at the same time)
  • GPUs – Nvidia Pascal Geforce GPU with 256 CUDA cores supporting DirectX 12, OpenGL 4.5, Nvidia CUDA 8.0, OpenGL ES 3.1, AEP, and Vulkan + 2D graphics engine
  • Memory – 128-bit LPDDR4 with ECC
  • Display – Triple display pipeline, each at up to 4K 60fps.
  • VPU – 4K60 H.265 and VP9 hardware video decoder and encoder
  • Others:
    • Gigabit Ethernet MAC
    • Dual-CAN (controller area network)
    • Audio engine
    • Security & safety engines including a dual-lockstep processor for reliable fault detection and processing
    • Image processor
  • ISO 26262 functional safety standard for electrical and electronic (E/E) systems compliance
  • Process – 16nm FinFet
PX Drive 2 Board with two Parker SoCs

PX Drive 2 Board with two Parker SoCs

Parker is said to deliver up to 1.5 teraflops (native FP16 processing) of performance for “deep learning-based self-driving AI cockpit systems”.

This type of board and processor is normally only available to car and part manufacturer, and the company claims than 80 carmakers, tier 1 suppliers and university research centers are now using DRIVE PX 2 systems to develop autonomous vehicles. That means the platform should find its way into cars, trucks and buses soon, including in some 100 Volvo XC90 SUVs part of an autonomous-car pilot program in Sweden slated to start next year.

Linux 4.7 Release – Main Changes, ARM and MIPS Architectures

July 25th, 2016 7 comments

Linux 4.7 is out:

So, after a slight delay due to my travels, I’m back, and 4.7 is out.

Despite it being two weeks since rc7, the final patch wasn’t all that big, and much of it is trivial one- and few-liners. There’s a couple of network drivers that got a bit more loving. Appended is the shortlog since rc7 for people who care: it’s fairly spread out, with networking and some intel Kabylake GPU fixes being the most noticeable ones. But there’s random small noise spread all over.

And obviously, this means that the merge window for 4.8 is open.Judging by the linux-next contents, that’s going to be a bigger release than the current one (4.7 really was fairly calm, I blame at least partly summer in the northern hemisphere).


Linux 4.6 brought USB 3.1 superspeed, OrangeFS distributed file system, 802.1AE MAC-level encryption (MACsec), and BATMAN V protocol support, improved the reliability of OOM task killer, and more.


Linux 4.7 most noticeable changes include:

  • Support for Radeon RX480 GPUs
  • Parallel directory lookups –  The directory cache caches information about path names to make them quickly available for pathname lookup. This cache uses a mutex to serialize lookup of names in the same directory.  The serializing mutex has been switched to a read-write semaphore in Linux 4.7, allowing for parallel pathname lookups in the same directory. Most filesystems have been converted to allow this feature.
  • New “schedutil” frequency governor –  There are two main differences between it and the existing governors. First, it uses information provided by the scheduler directly for making its decisions. Second, it can invoke cpufreq drivers and change the frequency to adjust CPU performance right away, without having to spawn work items to be executed in process context or similar, leading to lower latency to make frequency changes.
  • Histograms of events in ftrace –  . This release adds the “hist” command, which provides the ability to build “histograms” of events by aggregating event hits. As an example, let’s say a user needs to get a list of bytes read from files from each process. You can get this information using hist triggers, with the following command command:

    other data can also be retrieve by using fields found in /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/format. The output will look like:

    More more details check ftrace documentation and related LWN article.
  • EFI ‘Capsule’ firmware updates –  The EFI Capsule mechanism allows to pass data blobs to the EFI firmware. The firmware then parses them and makes some decision based upon their contents. The most common use case is to bundle a flashable firmware image into a capsule that the firmware can use to upgrade in the next boot the existing version in the flash. Users can upload capsule by writing the firmware to the /dev/efi_capsule_loader device
  • Support for creating virtual USB Device Controllers in USB/IP – USB/IP allows to share real USB devices over the network. Linux 4.7 brings the ability to create virtual USB Device Controllers without needing any physical USB device, using the USB gadget subsystem. For what purpose? For example, for improving phone emulation in development environments, for testing USB and for educational purposes.

Some of ARM specific improvements and new features include:

  • Allwinner:
    • Allwinner A13/R8 – Display Engine support
    • Allwinner A10/A20 – S/PDIF Support
    • Allwinner A31/A23/H3 – DMAengine improvements for H3 audio support
    • Allwinner H3 – USB support (multi-reset line support delayed til 4.8)
    • New hardware supported
      • Tablets – Dserve DSRV9703C, Polaroid MID2809PXE4, Colorfly e708 q1, Difrence DIT4350
      • Boards – Olimex A20 OLinuXino LIME2, Xunlong Orange Pi 2, Orange Pi One, and Orange Pi PC
  • Rockchip:
    • Thermal management – Rockchip driver support for RK3399, RK3366
    • Added Rockchip RK3399 clock and reset controller
    • Pinctrl – Support the .get_direction() callback in the GPIO portions
    • New RK3399 device tree support
    • Added Rockchip DisplayPort PHY support
    • Added Geekbuying GeekBox, RK3399 Evaluation Board, mqmaker MiQi SBC
  • Amlogic
    • Added Meson GXBB (S905) pinctrl support
    • Fixed memory nodes on Vega S95 DTS
    • Added Hardkernel ODROID-C2, Amlogic Meson GXBB P200 and P201 development systems
  • Samsung
    • Added Samsung ARTIK5 evaluation board
    • Added generic exynos bus frequency driver
    • Added pinctrl driver for Samsung EXYNOS5440 SoC
    • DTS updates & fixes:
      • Fix s5p-mfc driver probe on Exynos542x Peach boards (need to provide MFC memory banks). On these boards this was broken for long time but apparently no one enabled this driver till now.
      • Fix creation of debugfs entries for one regulator on Exynos4210 Trats board.
      • Fix probing of max8997 MFD driver (and its children) because of missing interrupt. Actually the current version of the driver probes (just without interrupts) but after switching to regmap and regmap-irq, the interrupt will be mandatory.
      • Cleanup regulator bindings on Exynos5420 boards.
      • Support MIC bypass in display path for Exynos5420.
      • Enable PRNG and SSS for all Exynos4 devices.
      • Add PL330 DMA controller and Thermal Management Unit to Exynos 7
      • Enable accelerated AES (Security SubSystem) on Exynos4412-based boards
      • Enable HDMI CEC on Exynos4412-based Odroid.
      • Add regulator supplies for eMMC/SD on Odroid XU3/XU4.
      • Fix DTC unit name warnings.
  • Qualcomm
    • Qualcomm IPQ4019 support in pinctrl
    • Change SMD callback parameters
    • 96Boards HiKey based on the Hisilicon Hi6220 (Kirin 620) gets an overhaul with a lot of devices enabled in the DT.
    • Added Qualcomm IPQ4019 “Internet processor”,  Arrow DragonBoard 600c (96boards) with APQ8064 Snapdragon 600
    • Device tree changes:
      • Add additional nodes for APQ8064
      • Fix APQ8064 pinctrls for i2c/spi
      • Add MSM8974 nodes for smp2p and smd
      • Modify MSM8974 memory reserve for rfsa and rmtfs
      • Add support for BQ27541 on Nexus7
  • Mediatek
    • Added  CPU power cooling model to Mediatek thermal driver
    • Added Mediatek MT8173 display driver, DRM driver, and thermal controller
    • Added MIPI DSI sub driver
    • 4GB mode support for Mediatek IOMMU driver
    • DTS updates:
      • add pinctrl node for mt2701
      • add mt2701 pmic wrapper binding
      • add auxadc binding document
  • Other new ARM hardware or SoCs – LG1312 TV SoC, Hisilicon Hip06/D03, Google Pixel C, NXP Layerscape 1043A QDS development board, Aspeed AST2400/AST2500, Oxnas 810SE (WD My Book World Edition), ARM MPS2 (AN385 Cortex-M3 & AN399 Cortex-M7), Ka-Ro electronics industrial SoM modules, Embest MarS Board, Boundary Devices i.MX6 Quad Plus Nitrogen6_MAX and SoloX Nitrogen6sx embedded boards, Technexion Pico i.MX6UL compute module, ZII VF610 Development Board, Linksys Viper (E4200v2 / EA4500) WiFi router, Buffalo Kurobox Pro NAS, samtec VIN|ING 1000 vehicle communication interface, Amazon Kindle Fire first generation tablet and ebook reader,  OnRISC Baltos iR 2110 and 3220 embedded industrial PCs, TI AM5728 IDK, TI AM3359 ICE-V2, and TI DRA722 Rev C EVM development systems.

MIPS architecture changelog:

  • Add support for relocatable kernel so it can be loaded someplace besides the default 1MB.
  • Add KASLR support using relocatable support
  • Add perf counter feature
  • Add support for extending builtin cmdline
  • seccomp: Support compat with both O32 and N32
  • ath79: Add support for DTB passed using the UHI boot protocol, remove the builtin DTB support, add zboot debug serial support, add initial support for DPT-Module, Dragino MS14 (Dragino 2), and Onion Omega
  • BMIPS: Add BCM6358 support, add Whirlwind (BMIPS5200) initialization code, add support for BCM63268
  • Lantiq: Add support for device tree file from boot loader
  • Add basic Loongson 3A support
  • Add support for CN73xx, CN75xx and CN78xx
  • Octeon: Add DTS for D-Link DSR-1000N
  • Detect DSP v3 support
  • Detect MIPSr6 Virtual Processor support
  • Enable ptrace hw watchpoints on MIPS R6
  • Probe the M6250 CPUand the P6600 core
  • Support sending SIG_SYS to 32bit userspace from 64bit kernel
  • qca: introduce AR9331 devicetree
  • ralink: add MT7628 EPHY LEDs pinmux support
  • smp-cps: Add nothreads kernel parameter
  • smp-cps: Support MIPSr6 Virtual Processors
  • MIPS64: Support a maximum at least 48 bits of application virtual

For even much more details, you can check out Linux 4.7 changelog with comments only generated using git log v4.6..v4.7 --stat. Alternatively, and much easier to read, you can head to to learn more about Linux 4.7 changes.

ARM announces “premium IP” for VR and AR with Cortex-A73 Processor and Mali-G71 GPU

May 30th, 2016 3 comments

Today ARM has revealed the first details of its latest mobile processor and GPU, both said to be optimized for VR (Virtual Reality) and AR (Augmented Reality) applications.

Starting with the ARM Cortex-A73, we’re looking at an evolution of the current Cortex-A72 with ARM claiming 30 percent “sustained” performance over the Cortex-A72 and over twice the performance over the Cortex-A57. ARM is already talking about clock speeds of up to 2.8GHz in mobile devices. Other improvements include an increase up to 64k L1 instruction and data cache, up from 48 and 32k respectively for the Cortex-A72, as well as up to 8MB of L2 cache.

ARM_Cortex_A73The Cortex-A73 continues to support ARM’s big.LITTLE CPU design in combination with the Cortex-A53 or the Cortex-A35. It’s also the first ARM core to have been designed to be built using 10nm FinFET technology and it should be an extremely small CPU at around 0.65 square millimeters per core, or a 46 percent shrink from the Cortex-A72. By moving to 10nm and FinFET, ARM is also promising power efficiency gains of up to 20 percent over the Cortex-A72.

Cortex A53 vs A72 vs A73

Cortex A53 vs A72 vs A73

The Mali-G71 GPU takes things even further, as ARM is promising a 50 percent increase in graphics performance, a 20 percent improvement in power efficiency and 40 percent more performance per square millimeter over its previous generation of GPU’s. To accomplish this, ARM has designed the Mali-G71 to support up to 32 shader cores, which is twice as many as the Mali-T880 and ARM claims that this will enable the Mali-G71 to beat “many discrete GPUs found in today’s mid-range laptops”. We’d take this statement with a grain of salt, as it takes more than raw computing performance to do a good GPU and that’s why there are so few companies that are still designing their own GPUs. As with the Cortex-A73, the Mali-G71 is optimized for 10nm FinFET manufacturing technology.

As always with ARM based GPUs, it depends on the partner implementation and the Mali-G71 supports designs with as little as one shader. Looking at most current mobile GPU implementations we’d expect to see most of ARM’s partners to go with a 4-8 shader implementation to keep their silicon cost at a manageable level. That said, we might get to see one or two higher-end implementations, as ARM has already gotten the likes of Samsung, MediaTek, Marvell and Hi-Silicon interested in its latest GPU.


With a big move towards VR and AR, it’s also likely that the ARM partners are going to have to move to a more powerful GPU to be able to deliver the kind of content that will be expected from these market spaces. According to the press release, it looks like ARM has already gotten Epic Games and Unity Technologies interested in supporting their latest GPU

Devices using the new ARM Cortex-A73 and Mali-G71 are expected sometime in 2017, so there’s quite a gap between the announcement and the availability of actual silicon, but with HiSilicon, Marvell, MediaTek, Samsung Electronics and others having already licensed Cortex A73 IP. at least it means we have something to look forward to next year. You can find more details on ARM Cortex A73 and Mali-G71 pages, as well as ARM community’s blog.

PowerVR GT7200 Plus and GT7400 Plus GPUs Support OpenCL 2.0, Better Computer Vision Features

January 7th, 2016 2 comments

Imagination Technologies introduced PowerVR Series7XT GPU family with up to 512 cores at the end of 2014, and at CES 2016, they’ve announced Series7XT Plus family with GT7200 Plus and GT7400 Plus GPUs, with many of the same features of Series7XT family, plus the addition of OpenCL 2.0 API support, and improvements for computer vision with a new Image Processing Data Master, and support for 8-bit and 16-bit integer data paths, instead of just 32-bit in the previous generation, for example leading to up to 4 times more performance for applications, e.g. deep learning, leveraging OpenVX computer vision API.

Block Diagram (Click to Enlarge)

Block Diagram (Click to Enlarge)

GT7200 Plus GPU features 64 ALU cores in two clusters, and GT7400 Plus 128 ALU cores in a quad-cluster configuration. Beside OpenCL2.0, and improvements for computer vision, they still support OpenGL ES 3.2, Vulkan, hardware virtualization, advanced security, and more. The company has also made some microarchitectural enhancements to improve performance and reduce power consumption:

  • Support for the latest bus interface features including requestor priority support
  • Doubled memory burst sizes, matching the latest system fabrics, memory controllers and memory components
  • Tuned the size of caches and improved their efficiency, leading to a ~10% reduction in bandwidth

The new features and improvements of PowerVR Series7XT Plus GPUs should help designed better systems for image classification, face/body/gesture tracking, smart video surveillance, HDR rendering, advanced driver assistance systems (ADAS), object and scene reconstruction, augmented reality, visual inspection, robotics, etc…

You can find more details on Imagination Tech Blog.

Maxsun MS-GTX960 Nvidia GTX960 Graphics Card Unboxing and Installation

December 24th, 2015 16 comments

When I wrote an article about H.265 and VP9 video encoding, I noticed only the second generation Maxwell Nvidia Graphics would support H.265 decoding (up to 500 fps) and HDMI 2.0 output, a few weeks after purchasing a first generation Nvidia GTX750 GPU… So when GearBest contacted me about Graphics cards reviews I said I would be interesting in HDMI 2.0 and H.265 capable graphics card, which I meant I had to get a card with Nvidia GM20x chip with the cheapest being GTX960. So the company agreed to send me Maxsun MS-GTX960 graphics card matching my requirements for $240.04. I won’t use it for gaming at all, but instead I plan to use the card to evaluate Kodi 16.x 4K H.265 and VP9 support and compare video performance to the cheap and low power Amlogic S905 TV boxes on the market, as well as try out H.265 video encoding, as it should speed up the process by up to 50 times compared to software only encoding. But first, I’ll show a few pictures of the GPU, and installation process that a little different from lower-end cards.

Maxsun MS-GTX960 Unboxing

I received the box via DHL, and was surprised by the rather large size of the package, and that I did not have to pay any custom duties for this type of item…

Maxsun_MS-GTX960_PackageThe card comes with 2GB GDDR5 RAM.

GTX960_2GB_RAMThe graphics card does look quite large and includes with two cooling fans.

Click to Enlarge

Click to Enlarge

Click to Enlarge

Click to Enlarge

The card has four video outputs: HDMI 2.0, DisplayPort, and two DVI ports.

Maxsun_MS-GTX960_HDMI_DisplayPort_DVIThere’s also a DVD or CDROM included with the graphics, but I did not checked it out, as the latest drivers are usually available online.

Maxsun MS-GTX960 Graphics Card Installation

This is what my previous Kodac GT750 card graphics card looks like when installed in my PC.

Zotac_GTX750_InstallationI’ve taken it out, and comparing it to GTX960, I was worried it would not fit due to its much longer length.

GTX960_vs_GTX750While there ere are more ports, there’s no VGA output, so I’ll have to find a DVI cable for my secondary display. Not a big deal.

GTX960_vs_GTX750_Video_OutputI was relieved when I realized the card would indeed fit into my computer, albeit it’s now pretty tight with my hard drive.

GTX960_Installation_LengthI also noticed a 6-pin connector on the top of the card, and after a Google search, I found it was to provide some extra power required for this type of card, and my power supply had this type of connector.


All good, I tightened the card with a screw, put all back together, and having upgraded from another Nvidia graphics card, the card was automatically recognized in Ubuntu 14.04, and worked out of the box.

Nvidia_GTX960_Drivers_UbuntuI like when everything goes smoothly :).

Merry Christmas to all!!!

Categories: Graphics, Hardware, Linux, Ubuntu Tags: gpu, h.265, hdmi, nvidia