Archive

Posts Tagged ‘gpu’

NVIDIA Introduces Jetson TX2 Embedded Artificial Intelligence Computer

March 8th, 2017 9 comments

NVIDIA has just announced an upgrade to to their Jetson TX1 module, with Jetson TX2 “Embedded AI Computer” with Tegra X2 Parker SoC that either doubles the performance of its predecessor, or runs at more than twice the power efficiency, while drawing less than 7.5 watts of power.

The company provided a comparison showing the differences between TX1 and TX2 modules.

Jetson TX2 Jetson TX1
GPU NVIDIA Pascal, 256 CUDA cores NVIDIA Maxwell, 256 CUDA cores
CPU HMP Dual Denver 2/2 MB L2 +
Quad ARM® A57/2 MB L2
Quad ARM® A57/2 MB L2
Video 4K x 2K 60 Hz Encode (HEVC)
4K x 2K 60 Hz Decode (12-Bit Support)
4K x 2K 30 Hz Encode (HEVC)
4K x 2K 60 Hz Decode (10-Bit Support)
Memory 8 GB 128 bit LPDDR4
58.3 GB/s
4 GB 64 bit LPDDR4
25.6 GB/s
Display 2x DSI, 2x DP 1.2 / HDMI 2.0 / eDP 1.4 2x DSI, 1x eDP 1.4 / DP 1.2 / HDMI
CSI Up to 6 Cameras (2 Lane)
CSI2 D-PHY 1.2 (2.5 Gbps/Lane)
Up to 6 Cameras (2 Lane)
CSI2 D-PHY 1.1 (1.5 Gbps/Lane)
PCIe Gen 2 | 1×4 + 1×1 OR 2×1 + 1×2 Gen 2 | 1×4 + 1×1
Data Storage 32 GB eMMC, SDIO, SATA 16 GB eMMC, SDIO, SATA
Other CAN, UART, SPI, I2C, I2S, GPIOs UART, SPI, I2C, I2S, GPIOs
USB USB 3.0 + USB 2.0
Connectivity 1x Gigabit Ethernet, 802.11ac WLAN, Bluetooth
Mechanical 50 mm x 87 mm (400-Pin Compatible Board-to-Board Connector)

The module still supports Linux for Tegra, as well as JetPack 3.0 SDK for AI computing with the following:

  • TensorRT 1.0  neural network inference engine for production deployment of deep learning applications
  • cuDNN 5.1, a GPU-accelerated library of primitives for deep neural networks
  • VisionWorks™ 1.6, a software development package for computer vision and image processing
  • The latest graphics drivers and APIs, including OpenGL 4.5, OpenGL ES 3.2, EGL 1.4 and Vulkan 1.0
  • CUDA 8, which turns the GPU into a general-purpose massively parallel processor, giving developers access to tremendous performance and power-efficiency

Just like with Jetson TX1 module, NVIDIA also provides Jetson TX2 Developer Kit, with a carrier board, Jetson TX2 module, and various accessories which can be preordered for $599 in the United States and Europe, and start shipping on March 14. The devkit will be launched in other regions in a few weeks. With the launch of the new TX2 devkit, NVIDIA also reduced the price of Jetson TX1 developer kit to $499.

You’ll find more details, and the pre-order link on NVIDIA’s Embedded Modules & Devkits page.

Linux 4.10 Release – Main Changes, ARM & MIPS Architectures

February 20th, 2017 3 comments

Linus Torvalds has just released Linux 4.10:

So there it is, the final 4.10 release. It’s been quiet since rc8, but we did end up fixing several small issues, so the extra week was all good.

On the whole, 4.10 didn’t end up as small as it initially looked. After the huge release that was 4.9, I expected things to be pretty quiet, but it ended up very much a fairly average release by modern kernel standards. So we have about 13,000 commits (not counting merges – that would be another 1200+ commits if you count those). The work is all over, obviously – the shortlog below is just the changes in the last week, since rc8.

Go out and verify that it’s all good, and I’ll obviously start pulling stuff for 4.11 on Monday. Linus

Linux 4.9 added Greybus staging support, improved security thanks to virtually mapped kernel stacks, and memory protection keys, included various file systems improvements, and many more changes.

Some newsworthy changes for Linux 4.10 include:

  • Virtual GPU support – Intel GVT-g for KVM (KVMGT) is a full GPU virtualization solution with mediated pass-through, starting from 4th generation Intel Core processors with Intel Graphics. Unlike direct pass-through alternatives, the mediated device framework allows KVMGT to offer a complete virtualized GPU with full GPU features to each one of the virtualized guests, with part of performance critical resources directly assigned, while still having performance close to native.
  • New ‘perf c2c’ tool, for cacheline contention analysis – perf c2c (for “cache to cache”) is a new tool designed to analyse and track down performance problems caused by false sharing on NUMA systems. The tool is based on x86’s load latency and precise store facility events provided by Intel CPUs. Visit C2C – False Sharing Detection in Linux Perf for more details about the tool.
  • Improved writeback management – Linux 4.10 release adds a mechanism that throttles back buffered writeback, which makes more difficult for heavy writers to monopolize the I/O requests queue, and thus provides a smoother experience in Linux desktops and shells than what people was used to. The algorithm for when to throttle can monitor the latencies of requests, and shrinks or grows the request queue depth accordingly, which means that it’s auto-tunable, and generally, a user would not have to touch the settings. Read Toward less-annoying background writeback for more details about this improvement.
  • FAILFAST support –  This release also adds “failfast” support. RAID disk with failed IOs are marked as broken quickly, and avoided in the future, which can improve latency.
  • Faster Initial WiFi Connection – Linux 4.10 adds support for using drivers with Fast Initial Link Setup as defined in IEEE 802.11ai. It enables a wireless LAN client to achieve a secure link setup within 100ms. This release covers only the FILS authentication/association functionality from IEEE 802.11ai, i.e., the other changes like scanning optimizations are not included.

Some notable ARM architecture improvements and new features:

  • Allwinner:
    • Allwinner A23 – Audio codec driver
    • Allwinner A31/A31s – Display Driver (first pipeline), audio codec support
    • Allwinner A64 – clock driver
    • Allwinner A80 – External SDIO WiFi
    • Allwinner H3 – Audio codec driver, SPI
    • New boards support: NextThingCo CHIP Pro, Pine A64, NanoPi M1
  • Rockchip:
    • Initial support for Rockchip PX5 & PX3 automotive platforms
    • Added Rockchip RK1108 evaluation board
    • Added support for Rikomagic MK808 Android TV stick based on Rockchip RK3066
    • Update Rockchip PCI driver to support for max-link-speed
    • Rockchip rk3399,rk3066 PLL clock optimizations
  • Amlogic
    • Support for the pre-release “SCPI” firmware protocol shipped by Amlogic in their GXBB SoC
    • Initial support for Amlogic S905D, and S912 (GXM) SoCs
    • Added support for Nexbox A1 and A95X Android TV boxes
    • Cleanup for the Amlogic Meson PWM driver
    • New Amlogic Meson Graphic Controller GXBB (S905)/GXL (S905X/S905D)/GXM (S912) SoCs (meson)
    • Resets for 2nd USB PHY
    • Initial support for the SD/eMMC controller in the Amlogic S905/GX* family of SoCs
    • Updated DTS to enable support for USB, I2C, SPI, maibox/MHU, PWM, ethernet MAC & PHY, secure monitor, IR, and watchdog.
  • Samsung
    • Device Tree for Samsung Exynos5433 mobile phone platform, including an (almost) fully supported phone reference board
    • Added support for TOPEET itop/elite board based on exynos4412
    • DeviceTree  updates:
      • Add Performance Monitor Unit to Exynos7.
      • Add MFC, JPEG and Gscaler to Exynos5433 based TM2 board.
      • Cleanups and fixes for recently added TM2 and TM2E boards.
      • Enable ADC on Odroid boards
      • Remove unused Exynos4415 DTSI
  • Qualcomm
    • Add support for Qualcomm MSM8992 (Snapdragon 808) and MSM8994 (Snapdragon 810) mobile phone SoCs
    • Added support for Huawei Nexus 6P (Angler) and LG Nexus 5X (Bullhead) smartphones
    • Support for Qualcomm MDM9615 LTE baseband
    • Support for WP8548 MangOH Open Hardware platform for IOT, based on Qualcomm MDM9615
    • Other device tree changes:
      • Added SDHC xo clk and 1.8V DDR support
      • Add EBI2 support to MSM8660
      • Add SMSC ethernet support to APQ8060
      • Add support for display, pstore, iommu, and hdmi to APQ8064
      • Add SDHCI node to MSM8974 Hammerhead
      • Add Hexagon SMD/PIL nodes
      • Add DB820c PMIC pins
      • Fixup APQ8016 voltage ranges
      • Add various MSM8996 nodes to support SMD/SMEM/SMP2P
  • Mediatek
    • Added clock for Mediatek MT2701 SoCs
    • New Mediatek drivers: mtk-mdp and mtk-vcodec (VP8/VP9/H.264) for MT8173
    • Updated the Mediatek IOMMU driver to use the new struct device->iommu_fwspec member
  • Other new ARM hardware platforms and SoCs:
    • Hisilicon – Hip07 server platform and D05 board
    • NXP – LS1046A Communication processor, i.MX 6ULL SoC, UDOO Neo board, Boundary Devices Nitrogen6_SOM2 (i.MX6), Engicam i.CoreM6, Grinn i.MX6UL liteSOM/liteBoard,  Toradex Colibri iMX6 module
    • Nvidia – Early support for the Nvidia Tegra Tegra186 SoC, NVIDIA P2771 board, and NVIDIA P3310 processor module
    • Marvell – Globalscale Marvell ESPRESSOBin community board based on Armada 3700, Turris Omnia open source hardware router based on Armada 385
    • Renesas “R-Car Starter Kit Pro” (M3ULCB) low-cost automotive board, Renesas RZ/G (r8a7743 and r8a7745) application processors
    • Oxford semiconductor (now Broadcom) OX820 SoC for NAS devices, Cloud Engines PogoPlug v3 based on OX820
    • Broadcom – Various wireless devices: Netgear R8500 router, Tenda AC9 router, TP-LINK Archer C9 V1, Luxul XAP-1510 Access point
    • STMicro  – stm32f746 Cortex-M7 based microcontroller
    • Texas Instruments – DRA71x automotive processors, AM571x-IDK industrial board based on TI AM5718
    • Altera – Macnica Sodia development platform for Altera socfpga (Cyclone V)
    • Xilinx – MicroZed board based on Xilinx Zynq FPGA platforms

That’s a long list of changes and new boards and devices… Linux 4.10 only brings few MIPS changes however:

  • KVM fixes: fix host kernel crashes when receiving a signal with 64-bit userspace,  flush instruction cache on all vcpus after generating entry code (both for stable)
  • uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
  • RTC updates:  Remove obsolete code and probe the jz4740-rtc driver from devicetree for jz4740, qi_lb60
  • microblaze/irqchip: Moved intc driver to irqchip. The Xilinx AXI Interrupt Controller IP block is used by the MIPS based xilfpga platform and a few PowerPC based platforms.
  • crypto: poly1305 – Use unaligned access where required, which speeds up performance on small MIPS routers.
  • MIPS: Wire up new pkey_{mprotect,alloc,free} syscalls

You can also read Linux 4.10 changelog with comments only, generated using git log v4.9..v4.10 --stat, in order to get a full list of changes. Alternatively, you could also read Linux 4.9 changelog on kernelnewbies.org.

Self-hosted OpenGL ES Development on Ubuntu Touch

January 15th, 2017 4 comments

Blu wrote BQ Aquaris M10 Ubuntu Edition review – from a developer’s perspective – last year, and now is back with a new post explaining how to develop and deploy OpenGL ES applications directly on the Ubuntu Touch tablet.

Ever since I started using a BQ M10 for console apps development on the go I’ve been wanting to get something, well, flashier going on that tablet. Since I’m a graphics developer by trade and by heart, GLES was the next step on the Ubuntu Touch for me. This article is about writing, building and deploying GLES code on Ubuntu Touch itself, sans a desktop PC. Keep that in mind if some procedure seems unrefined or straight primitive to you – for one, I’m a primitive person, but some tools available on the desktop are, in my opinion, impractical on the Touch itself. That means no QtCreator today, nor Qt, for that matter.

The display of any contemporary Ubuntu Touch device is powered by Mir – a modern compositor/surface manager taking care of all (rectangular-ish) things eventually appearing on screen. We won’t be delving much into Mir beyond obtaining an EGL context (EGL being the binding layer between GLES and the native windowing system). But enough ado – let’s get to work.

Preparations for doing GLES on a Ubuntu Touch box:

The above, as of the time of this writing, should provide you with gcc/g++-4.9, make and gdb-7.9, among other things. The last package and its dependencies provide you with up-to-date Mir headers. Git comes out of the box, IIRC, but if it’s missing just apt-get it.

We need a primer to step on, so here’s my adaptation of Don Bright’s Mir/GLES adaptation of Joe Groff’s OpenGL tutorials, using Daniel van Vugt’s Mir/EGL examples (yes, that’s a quite a chain-work):

I’ve taken the liberty to expand on the work of those gentlemen by bringing the Mir integration up to date, handling Touch’s novelty Desktop Mode and throwing in my own dusty GLES sample code, for good measure.

To build and install the primer, just do:

That will provide you with an original police-car flashing-lights primer. An alternative primer featuring tangential-space bump-mapping can be built by passing arg ‘guest’ to the build script:

Both versions of the primer use a fundamentally identical interface — a resource-initialization procedure and a frame-drawing procedure, so it’s not much of an effort to use the respective routines from either primers in the framework of the host app hello.cpp, and thus get a running render loop.

A few words about the peculiarities of the GLES development for Ubuntu Touch. It took me some time to show anything on screen, despite the fact I had a valid draw context and a render loop soon after the primer was building successfully. The reason is Unity8 on the Touch will not simply let you run a window-painting app from the terminal – you would get your Mir and EGL contexts alright, but the target surface will never be composited to the screen of the device upon eglSwapBuffers() unless you take certain actions. You have two alternatives here:

  • Produce a valid Click package from your app and subsequently install that to the Apps pane (what our build script does), where you can launch from an icon, or…
  • Use a launcher app to start your window app (info courtesy of Daniel van Vugt):

Unfortunately the second (much quicker and convenient) approach is not currently usable due to a bug, so we’ll stick with the first. Any command-line args we’d want to pass to the app will need to be written to the app’s .desktop file, which can be found at the official app location after installation:

In that file, set the desired args on the ‘Exec’ line, like this:

Another peculiarity was that in Desktop Mode the app window does a classical ‘zoom to full size’ animation at start. Nothing extraordinary in that, if not for the fact that the Mir surface itself resizes along with the window. Now, a default viewport in a GLES context spans the geometry of the target surface at the time of its creation, which, in our case, is the start of the window-zoom animation, with its tiny surface geometry. One needs to wait for the zoom animation to finish, and then set the viewport geometry to the final geometry of the Mir surface, or live with a post-stamp-sized output in the lower left corner of the window, if the viewport is left unchanged.

Once we get past those teething hurdles we actually get quite a nicely behaving full-screen app on our hands – it composites smoothly with all other Ubuntu Touch desktop elements like the Launcher tab at the desktop’s left edge and the pull-down Indicator pane on right (see screenshot). Our app even does live output to the Scopes selector screen (i.e. the task-switching screen) — behold the miracles of modern-day screen compositors! ; )

Click for Original Size (1920×1080)

But hey, don’t just take my word for it – try out GLES coding on a Ubuntu Touch device – you have the basics covered:

  • App’s rendering loop and the entirety of the flashing-screen primer are in hello.cpp
  • Mir context creation and subsequent EGL context binding are in eglapp.cpp
  • Bump-mapping primer is entirely in app_sphere.cpp
  • Various helpers are spread across util_* TUs and hello.cpp
  • All files necessary for the generation of the Click package are in resource folder.

In conclusion, self-sustained development on the Ubuntu Touch is a perfectly viable scenario (take that, iOS!). Moreover, the GPU in the BQ M10 turned out to have a very nice modern GLES3 (3.1) stack – see excerpts from the app logs below. Actually, this is my first portable device with a GLES 3.1 stack, so I haven’t started using it properly yet — the GLES2 primer above doesn’t make use of the new functionality.

If I have to complain about something from the development of this primer, it’d be that I couldn’t use my arm64 code on the primer, since there are only armhf (32-bit) EGL/GLES libraries available for the Touch. So 64-bit code on the Ubuntu Touch remains in console land for now.

Excerpts from the primer logs:

egl version, vendor, extensions:

1.4 Android META-EGL
Android
EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_image EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable EGL_HYBRIS_native_buffer2 EGL_HYBRIS_WL_acquire_native_buffer EGL_WL_bind_wayland_display

gl version, vendor, renderer, glsl version, extensions:

OpenGL ES 2.0 (OpenGL ES 3.1)
ARM
Mali-T720
OpenGL ES GLSL ES 3.10
GL_EXT_debug_marker GL_ARM_rgba8 GL_ARM_mali_shader_binary GL_OES_depth24 GL_OES_depth_texture GL_OES_depth_texture_cube_map GL_OES_packed_depth_stencil GL_OES_rgb8_rgba8 GL_EXT_read_format_bgra GL_OES_compressed_paletted_texture GL_OES_compressed_ETC1_RGB8_texture GL_OES_standard_derivatives GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_texture_npot GL_OES_vertex_half_float GL_OES_required_internalformat GL_OES_vertex_array_object GL_OES_mapbuffer GL_EXT_texture_format_BGRA8888 GL_EXT_texture_rg GL_EXT_texture_type_2_10_10_10_REV GL_OES_fbo_render_mipmap GL_OES_element_index_uint GL_EXT_shadow_samplers GL_OES_texture_compression_astc GL_KHR_texture_compression_astc_ldr GL_KHR_texture_compression_astc_hdr GL_KHR_debug GL_EXT_occlusion_query_boolean GL_EXT_disjoint_timer_query GL_EXT_blend_minmax GL_EXT_discard_framebuffer GL_OES_get_program_binary GL_OES_texture_3D GL_EXT_texture_storage GL_EXT_multisampled_render_to_texture GL_OES_surfaceless_context GL_OES_texture_stencil8 GL_EXT_shader_pixel_local_storage GL_ARM_shader_framebuffer_fetch GL_ARM_shader_framebuffer_fetch_depth_stencil GL_ARM_mali_program_binary GL_EXT_sRGB GL_EXT_sRGB_write_control GL_EXT_texture_sRGB_decode GL_KHR_blend_equation_advanced GL_OES_texture_storage_multisample_2d_array GL_OES_shader_image_atomic

GL_MAX_TEXTURE_SIZE: 8192
GL_MAX_CUBE_MAP_TEXTURE_SIZE: 4096
GL_MAX_VIEWPORT_DIMS: 8192, 8192
GL_MAX_RENDERBUFFER_SIZE: 8192
GL_MAX_VERTEX_ATTRIBS: 16
GL_MAX_VERTEX_UNIFORM_VECTORS: 1024
GL_MAX_VARYING_VECTORS: 15
GL_MAX_FRAGMENT_UNIFORM_VECTORS: 1024
GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS: 48
GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS: 16
GL_MAX_TEXTURE_IMAGE_UNITS: 16

Imagination PowerVR G6230 is the First GPU To Pass Khronos OpenVX 1.1 Conformance

December 19th, 2016 3 comments

The Khronos Group is the non-profit consortium group behind open standards and APIs for graphics, media and parallel computation such as OpenGL for 3D graphics, OpenCL for GPGPU, OpenVG for 2D vector graphics, etc… OpenVX is one of their most recent open, royalty-free standard, and targets power optimized acceleration of computer vision applications such as face, body and gesture tracking, smart video surveillance, advanced driver assistance systems (ADAS), object and scene reconstruction, augmented reality, visual inspection, robotics and more. The first revision of the standard was released in 2014, and the latest OpenVX 1.1 revision was just released in May 2016.

allwinner-a80-powervr-openvxWe’ve already seen OpenVX 1.1 support in Nvidia Jetson TX1 module & board, but Khronos has a conformance program to test  implementations, and if successful, allow companies to use the logo and name of the API. The version first GPU to pass OpenVX 1.1 conformance is Imagination Technologies PowerVR G6230 using an Allwinner A80 hardware platform and Imagination’s OpenVX DDK. OpenVX 1.1 is said to “expands node functionality and enhances graph framework” compared to OpenVX 1.0.

openvx-framework

OpenVX Framework Efficiency

Imagination Technologies also has a blog post with further details including a comparison between OpenCV open source software and OpenVX.

ARM Introduces Bifrost Mali-G51 GPU, and Mali-V61 4K H.265 & VP9 Video Processing Unit

November 1st, 2016 4 comments

Back in May of this year, ARM unveiled Mali-G71 GPU for premium devices, and the first GPU of the company based on Bifrost architecture. The company has now introduced the second Bifrost GPU with Mali-G51 targeting augmented & virtual reality and higher resolution screens to be found in mainstream devices in 2018, as well as Mali-V61 VPU with 4K H.265 & VP9 video decode and encode capabilities, previously unknown under the codename “Egil“.

Mali-G51 GPU

Click to Enlarge

Click to Enlarge

ARM Mali-G51 will be 60% more energy efficiency, and have 60% more performance density compared to Mali-T830 GPU, making the new GPU the most efficient ARM GPU to date. It will also be 30% smaller, and support 1080p to 4K displays.

Under the hood, Mali-G51 include an updated Bifrost’s low level instruction set, a dual-pixel shader core per GPU core to deliver twice the texel and pixel rates, features the latest ARM Frame Buffer Compression (AFBC) 1.2, and supports Vulkan, OpenGL ES 3.2, and OpenCL 2.0 APIs.

More information can be found on the product page, and an ARM community blog post entitled “The Mali-G51 GPU brings premium performance to mainstream mobile“.

Mali-V61 VPU

mali-v61-4k-120hz

Mali-V61 can scale from 1 to 8 cores to handle 1080p60 up to 4K @ 120 fps, supports 8-/10-bit HEVC & 8-/10-bit VP9 up to 4K UHD video encoding and decoding, making it ideal for 4K video conference and chat, as well as 32MP multi-shot @ 20 fps.

The company claims H.265 and VP9 video encoding quality is about the same for a given bitrate with Mali-V61 as shown in the diagram below.

Click to Enlarge

VP9 vs HEVC vs H.264 – Click to Enlarge

Beside the capability of selecting 1 to 8 cores, silicon vendors can also decide whether they need encoding or decoding block for their SoC. For example camera SoC may not need video decoding support, while STB SoCs might do without encoding. While Mali-V61 is a premium IP block, ARM is also expecting it in mainstream devices possibly also featuring Cortex A53 processor cores and Mali-G51 GPU.

You’ll find more details on the product page, and ARM community “Mali-V61 – Premium video processing for Generation Z and beyond” blog post.

This Video Shows Vulkan API’s Higher Power Efficiency Compared to OpenGL ES API on ARM SoCs

October 20th, 2016 1 comment

Vulkan was introduced as the successor of OpenGL ES in March 2015, promising to take less CPU resources, and support multiple command buffers that can be created in parallel and distributed over several cores, at the cost of slightly more complex application programming since less software work in done inside the GPU drivers themselves with app developers needing to handle memory allocation and thread management.

opengl-es-vs-vulkanThis was just a standard at the time, so it still needed some time to implement Vulkan, and work is still in program but ARM showcased the power efficiency of Vulkan over OpenGL ES in the video embedded at the end of this post.

The demo has the same graphics details and performance using both OpenGL ES and Vulkan, but since the load on the CPU in that demo can be distributed over several CPU cores with Vulkan against a single core for OpenGL ES, it’s possible to use low power cores (e.g. Cortex A53) operating at a lower frequency and voltage, hence reducing power consumption.

ARM also measured that the complete OpenGL ES demo would use 1270 joules against 1123 Joules for the Vulkan demo, resulting in about 15% energy savings in this “early stage” demo.

Categories: Android, Video Tags: gpu, opengl, power, vulkan

HiSilicon Kirin 960 Octa Core Application Processor Features ARM Cortex A73 & A53 Cores, Mali G71 MP8 GPU

October 20th, 2016 2 comments

Following on Kirin 950 processor found in Huawei Mate 8, P9, P9 Max & Honor 8 smartphones, Hisilicon has now unveiled Kirin 960 octa-core processor with four ARM Cortex A73 cores, four Cortex A53 low power cores, a Mali G71 MP8 GPU, and an LTE Cat.12 modem.

kirin-960-block-diagram

The table below from Anandtech compares features and specifications of Kirin 950 against the new Kirin 960 processor.

SoC Kirin 950 Kirin 960
CPU 4x Cortex A72 (2.3 GHz)
4x Cortex A53 (1.8 GHz)
4x Cortex A73 (2.4 GHz)
4x Cortex A53 (1.8 GHz)
Memory
Controller
LPDDR3-933
or LPDDR4-1333
(hybrid controller)
LPDDR4-1800
GPU ARM Mali-T880MP4
@ 900 MHz
ARM Mali-G71MP8
@ 900 MHz
Interconnect ARM CCI-400 ARM CCI-550
Encode/
Decode
1080p H.264
Decode & Encode2160p30 HEVC
Decode
2160p30 HEVC & H.264
Decode & Encode2160p60 HEVC
Decode
Camera/ISP Dual 14bit ISP
940MP/s
Improved
Dual 14bit ISP
Sensor Hub i5 i6
Storage eMMC 5.0 UFS 2.1
Integrated
Modem
Balong Integrated
UE Cat. 6 LTE
Integrated
UE Cat. 12 LTE
4x CA
4×4 MIMO

ARM claims 30% “sustained” performance improvement between Cortex A72 and Cortex A73,  but the GPU should be where the performance jump is more significant, as ARM promises a 50 percent increase in graphics performance, and a 20 percent improvement in power efficiency with Mali G71 compared the previous generation (Mali-T880). Kirin 960 also integrates twice the GPU cores compared to Kirin 950, and some GPU benchmarks provided by Hisilicon/Huawei confirm the theory with over 100% performance improvement in both Manhattan 1080p offscreen and T-Rex offscreen GFXBench 4.0 benchmarks.

kirin960-gpu-performance
The first smartphone to feature Kirin 960 is likely to be Huawei Mate 9 rumored to come with a 5.9″ 2K display, 6GB RAM, and 256 UFS flash.

Imagination Technologies Announces MIPS Warrior I-class I6500 Heterogeneous CPU with up to 384 Cores

October 13th, 2016 No comments

Imagination has just unveiled the successor of MIPS I6400 64-Bit Warrior Core with MIPS Warrior I-class I6500 heterogeneous CPU supporting up to 64 cluster, with up to 6 cores each (384 cores max), themselves up to 4 thread (1536 max), combining with IOCU (IO coherence units), and external IP such as PowerVR GPU or other hardware accelerators.

mips-i6500-scalable-computeThe main features of MIPS I6400 processor are listed as follows:

 

  • Heterogeneous Inside – In a single cluster, designers can optimize power consumption with the ability to configure each CPU with different combinations of threads, different cache sizes, different frequencies, and even different voltage levels.
  • Heterogeneous Outside – The latest MIPS Coherence Manager with an AMBA ACE interface to popular ACE coherent fabric solutions such as those from Arteris and Netspeed lets designers mix on a chip configurations of processing clusters – including PowerVR GPUs or other accelerators – for high system efficiency.
  • Simultaneous Multi-threading (SMT) – Based on a superscalar dual issue design implemented across generations of MIPS CPUs, this  feature enables execution of multiple instructions from multiple threads every clock cycle, providing higher utilization and CPU efficiency.
  • Hardware virtualization (VZ) – I6500 builds on the real time hardware virtualization capability pioneered in the MIPS I6400 core. Designers can save costs by safely and securely consolidating multiple CPU cores with a single core, save power where multiple cores are required, and dynamically and deterministically allocate CPU bandwidth per application.
  • SMT + VZ – The combination of SMT with VZ in the I6500 offers “zero context switching” for applications requiring real-time response. This feature, alongside the provision of scratchpad memory, makes the I6500 ideal for applications which require deterministic code execution.
  • Designed for compute intensive, data processing and networking applications – The I6500 is designed for high-performance/high-efficiency data transfers to localized compute resources with data scratchpad memories per CPU, and features for fast path message/data passing between threads and cores.
  • OmniShield-ready – Imagination’s multi-domain security technology used across its processing families enables isolation of applications in trusted environments, providing a foundation for security by separation.

The processor is also based on the standard MIPS ISA, so developer will be able to leverage existing software and tools such as compilers, debuggers, operating systems, hypervisors and application software already optimized for the MIPS ISA.

mips-i6500-soc

 

The figure above shows what an SoC based on MIPS I6500 may look like with one cluster with 4 CPU cores, 2 IOCUs, another cluster with any CPU cores but instead eight IOCUs interlinked with third party accelerators, and one PowerVR GPU.

Target applications include advanced driver assistance systems (ADAS), autonomous vehicles, networking, drones, industrial automation, security, video analytics, machine learning, and more. One of the first customer for the new processor is Mobileye EyeQ5 SoC designed for  Fully Autonomous Driving (interestingly shortened as “FAD”) vehicles will eight multi-threaded MIPS CPU cores coupled with eighteen cores of Mobileye’s Vision Processors (VPs). EyeQ5 SoC should be found in vehicles as early as 2021.

MIPS I6500 CPU can be licensed now, with general availability planned for Q1 2017.You’ll find more technical details on the product page, and blog post for the announcement.