Arm Announces Cortex-A76 CPU with Laptop-class Performance, Mali-G76 GPU, Mali-V76 8K VPU

Arm Cortex A75 based processors are only found in a few SoCs and devices, but Arm keeps on innovating, and they’ve now announced a new suite of of IP with Cortex-A76 CPU enabling 35 percent more performance, and Mali-G76 GPU with ML support and 30 percent higher efficiency and performance.

SoC based on those new CPU and GPU IP will provide “laptop-class” performance, and the company also announced Arm Mali-V76 VPU with support for 8K video decoding and encoding.

Arm Cortex A76

Arm Cortex A76

After Cortex A75, the Arm Cortex-A76 CPU is the second high performance processor core based on DynamIQ technology, and beside the 35 percent  performance gain mentioned in the introduction, it also offers 40 percent improved efficiency, as well as delivers 4x compute performance improvements for AI/ML at the edge.

Highlights of Cortex A76:

  • Architecture – Armv8-A (Harvard) with  Armv8.1, Armv8.2, Armv8.3 (LDAPR instructions only),  cryptography and RAS extensions
  • ISA support – A64; A32 and T32 (at the EL0 only)
  • Microarchitecture
    • Pipeline – Out-of-order
    • Superscalar
    • NEON / Floating Point Unit
    • Optional Cryptography Unit
    • Up to four CPUs in cluster
    • Physical addressing (PA) – 40-bit
  • Memory system and external interfaces
    • 64KB L1 I-Cache / D-Cache
    • 256KB to 512KB L2 Cache
    • Optional 512KB to 4MB L3 cache
    • ECC Support, LPAE
    • Bus interfaces – AMBA ACE or CHI
    • Optional ACP
    • Optional Peripheral Port
Click to Enlarge

Cortex A76 SoC should provide around twice the performance as Cortex A73 SoC in laptops, considering the improvements in microarchitecture, lower process node (7nm vs 16nm), and higher CPU frequency (Up to 3GHz+). big.LITTLE performance at the same power envelop (5W) should also be about twice as good.

Some of the key microarchitectural enhancements include:

  • Decoupled branch prediction and instruction fetch – Built to hide latency at high bandwidth, the in-order Cortex-A76 front-end is able to fetch 4 to 8 instructions per cycle, using multi-level branch target caches and hybrid indirect predictor to sustain the maximum throughput.
  • Arm’s first 4-wide decode core, increasing the maximum instruction per cycle capability. Up to 8 operations per cycle can then be dispatched to the out-of-order core, supporting a wider area-/power-optimized instruction window.
  • More integer and vector execution throughput – Quad-issue integer units are integrated in the core including 3x simple ALU and 1x multi-cycle integer. Moreover, Cortex-A76 supports dual-issue native 16B (128-bit) vector and floating-point units, twice the throughput of any previous Arm CPU. Vitally, it can deliver the 4x ML performance improvements we mentioned earlier.
  • Enhanced memory system – The full cache hierarchy is co-optimized for latency and bandwidth, with a sophisticated 4th generation prefetcher, deep memory-level parallelism
Arm Cortex A76 3GHz
Click to Enlarge

You’ll find more details in a dedicated Arm Community blog post, and the product page.

Arm Mali-G76

Beside the 30% improvement in performance density and energy efficiency, Arm Mali-G76 Bifrost architecture based GPU also delivers around 2.7 times machine learning (ML) improvements over Mali-G72 GPU.

Some of the specifications of Mali-G76 GPU:

  • Anti-Aliasing – 4x MSAA, 8x MSAA, 16x MSAA
  • API Support – OpenGL ES 1.1, 2.0, 3.1, 3.2, Vulkan 1.1, OpenCL 1.1, 1.2, 2.0 Full Profile
  • Bus Interface – AMBA 4, ACE-LITE
  • L2 Cache – 512KB to 4MB
  • Scalability  – 4 to 20 Cores
  • Adaptive Scalable Texture Compression (ASTC) – Low Dynamic Range (LDR) and High Dynamic Range (HDR), supports both 2D and 3D images.
  • Arm Frame Buffer Compression (AFBC) – Version 1.2; 4×4 pixel block size
Mali-G76 vs Mali-G72 – Click to Enlarge

The GPU will be used in “premium mobile”, virtual reality, machine learning, and automotive applications.

For more information, check out the blog post and product page on Arm’s website.

Arm Mali-V76


Mali-V76 is the latest video processing unit (VPU) from Arm with support for 8K video decoding @ 60 fps, and also suitable for video walls with 2×2 4K UHD videos, or 4×4 1080p HD videos.

Main features of Mali-V76 VPU:

  • Multi-standard video processor
  • 10/8-bit HEVC, VP9, VP8, H.264, AVS+/AVS and legacy
  • Simultaneous encode and decode
  • Programmability/flexibility
  • Scalable 2-8 cores (8K60D/8K30E)

No mention of AV1 codec, so we’ll probably have to wait for 2020 or beyond  before AV1 makes it into silicon.

Click to Enlarge

Mali-V76 is an evolution of Mali-V61 video processor with twice the decode performance, a 40% smaller area for 4K120 performance, 25% additional bitrate saving , twice the bus fabric latency tolerance, and additional support for 10-bit H.264 codec and 8-bit AVS+/AVS decode.

Again, you can find out more on Arm’s product page and blog post about Mali-V76 VPU.

Share this:

Support CNX Software! Donate via cryptocurrencies or become a Patron on Patreon

ROCK Pi 4C Plus
Notify of
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
Weller PCB manufacturer