Arm Announces Cortex-A78 CPU, Mali-G78 GPU, Ethos-N78 NPU and Custom Cortex-X1 Core

Arm has just announced its 2020 Arm Mobile IP portfolio with no less than five IP blocks including Arm Cortex-A78 CPU, Arm Mali-G78 and G68 GPUs, Arm Ethos-N78 neural processing unit, and the custom Cortex-X program starting with Cortex-X1, the most powerful Arm core to date.

Arm Cortex-A78 CPU

Arm Cortex A78 Block Diagram
Click to Enlarge

Cortex-A78 highlights:

  • Architecture –  Armv8-A (Harvard)
  • Extensions – Armv8.1, Armv8.2, Cryptography, and RAS; Armv8.3 (LDAPR instructions only)
  • ISA support – A64, A32, and T32 (at EL0 only)
  • Microarchitecture
    • Pipeline – Out of order
    • Superscalar
    • Neon / Floating Point Unit
    • Optional cryptography Unit
    • Max number of CPUs in cluster – 4
    • Physical Addressing (PA) – 40-bit
  • Memory system and external interfaces
    • 32KB to 64KB L1 I-Cache / D-Cache
    • 256KB to 512KB L2 Cache
    • Optional 512KB to 4MB L3 Cache
    • ECC and LPAE support
    • Trustzone security

Cortex-A78 vs Cortex-A77 Performance Improvement

Cortex-A78 delivers 20% extra performance compared to Cortex-A77 at the same power budget (one Watt), but peak performance is about 7% faster, and machine learning performance is basically the same. So the real benefit of Cortex-A78 is higher efficiency which should lead to either a more constant performance or longer battery life with Cortex-A78 consuming 50% less than Cortex-A77 at the same performance.

Cortex-A78 vs Cortex-A77

We’ll provide more data and charts below in our comparison with Cortex-X1 processor.

More details about Cortex-A78 can be found on the product page, announcement post, and developer website.

Arm Mali-G78 GPU

Arm Mali-G78 GPU Block Diagram
Click to Enlarge

As usual, Arm will announce accompanying GPU, Display Processor, and NPU with their lastest Arm Cortex-A core, this time with Mali-G78 GPU with the following key features and specifications:

  • Architecture – Second-generation Valhall architecture
  • Number of Cores – 7 to 24 cores
  • API support – OpenGL ES 1.1, 2.0, 3.1, 3.2; Vulkan 1.1, 1.2; OpenCL 1.1, 1.2, 2.0 Full profile
  • AMBA 4 ACE, ACE-LITE, and AXI bus interface
  • Configurable 512KB – 2MB L2 cache
  • 4x/8x/16x MSAA Anti-aliasing
  • Adaptive Scalable Texture Compression (ASTC) – Low Dynamic Range (LDR) and High Dynamic Range (HDR).
  • Arm Frame Buffer Compression (AFBC) v1.3

Compared to Mali-G77, Mali-G78 is said to provide a GPU performance boost of up to 25% and also improve on-device ML capabilities by up to 15%. Mali-G78 is also more efficient and a new Fused Multiply-Add (FMA) unit in the execution engine leading to a 30% energy reduction in the unit. The new Asynchronous Top Level feature, together with tiler and fragment dependency tracking improvements, plays a key role in increasing the performance of PC-like games such as Fortnite and PUBG.


There’s also a new “sub-premium” Mali-G68 GPU with many of the same features as Mali-G78 but limited to 6 cores for lower costs and power consumption.

More information can be found on the blog post, product page, and developer’s website.

Ethos-N78 NPU

Ethos N78 Performance

The Ethos-N78 NPU supports up to 90 different configurations with performance ranging from 1 TOPS to 10 TOPS, and customizable area (inferences/s/mm2), throughput (inferences/s) and average DRAM bandwidth (GB/s). The new NPU also delivers up to twice the peak performance of Ethos N77, 25% better performance efficiency, and 40% greater DRAM bandwidth efficiency.

Visit the product page, and/or read Arm blog post for more details.

Arm Cortex-X1 and Arm Cortex-X Custom Program

Usually, Arm would stop here with their new IP announcements, but this year is a little different, as the company has also introduced the Cortex-X Custom (CXC) program where partners can work in collaboration with Arm engineers to design a CPU closely meeting their specific requirements and go beyond  Cortex-A performance, power, and area (PPA).

Cortex-A78 vs Cortex-X1
Cortex-A78 vs Cortex-X1 Features Comparison

The first CPU part of the CXC program is called the Arm Cortex-X1 CPU. It brings 30 percent peak performance improvements over Arm Cortex-A77 CPU and 22% over the just-announced Cortex-A78 core. The Cortex-X1 also delivers twice the machine learning (ML) performance compared to Cortex-A77.

Cortex-A77 vs Cortex-A78 vs Cortex-X1

It’s also possible to make full use of DynamIQ technology by combining one Cortex-X1 core with Cortex-A78 and Cortex-A55 cores to bring a specific boost to single-core performance (+30 percent) at the cost of a larger cluster area due to the more powerful core and larger 8MB L3 cache.

Cortex-X1 DynamIQ

There’s no product page nor developer’s info for Cortex-X1 just yet, so it may take a bit longer to come to market. You can still read more about it on Arm community’s blog post.

Share this:

Support CNX Software! Donate via cryptocurrencies or become a Patron on Patreon

ROCK Pi 4C Plus
Notify of
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
Weller PCB manufacturer