Arm Announces Cortex-A78 CPU, Mali-G78 GPU, Ethos-N78 NPU and Custom Cortex-X1 Core

Arm has just announced its 2020 Arm Mobile IP portfolio with no less than five IP blocks including Arm Cortex-A78 CPU, Arm Mali-G78 and G68 GPUs, Arm Ethos-N78 neural processing unit, and the custom Cortex-X program starting with Cortex-X1, the most powerful Arm core to date.

Arm Cortex-A78 CPU

Click to Enlarge

Cortex-A78 highlights:

  • Architecture –  Armv8-A (Harvard)
  • Extensions – Armv8.1, Armv8.2, Cryptography, and RAS; Armv8.3 (LDAPR instructions only)
  • ISA support – A64, A32, and T32 (at EL0 only)
  • Microarchitecture
    • Pipeline – Out of order
    • Superscalar
    • Neon / Floating Point Unit
    • Optional cryptography Unit
    • Max number of CPUs in cluster – 4
    • Physical Addressing (PA) – 40-bit
  • Memory system and external interfaces
    • 32KB to 64KB L1 I-Cache / D-Cache
    • 256KB to 512KB L2 Cache
    • Optional 512KB to 4MB L3 Cache
    • ECC and LPAE support
    • Trustzone security

Cortex-A78 delivers 20% extra performance compared to Cortex-A77 at the same power budget (one Watt), but peak performance is about 7% faster, and machine learning performance is basically the same. So the real benefit of Cortex-A78 is higher efficiency which should lead to either a more constant performance or longer battery life with Cortex-A78 consuming 50% less than Cortex-A77 at the same performance.

We’ll provide more data and charts below in our comparison with Cortex-X1 processor.

More details about Cortex-A78 can be found on the product page, announcement post, and developer website.

Arm Mali-G78 GPU

Click to Enlarge

As usual, Arm will announce accompanying GPU, Display Processor, and NPU with their lastest Arm Cortex-A core, this time with Mali-G78 GPU with the following key features and specifications:

  • Architecture – Second-generation Valhall architecture
  • Number of Cores – 7 to 24 cores
  • API support – OpenGL ES 1.1, 2.0, 3.1, 3.2; Vulkan 1.1, 1.2; OpenCL 1.1, 1.2, 2.0 Full profile
  • AMBA 4 ACE, ACE-LITE, and AXI bus interface
  • Configurable 512KB – 2MB L2 cache
  • 4x/8x/16x MSAA Anti-aliasing
  • Adaptive Scalable Texture Compression (ASTC) – Low Dynamic Range (LDR) and High Dynamic Range (HDR).
  • Arm Frame Buffer Compression (AFBC) v1.3

Compared to Mali-G77, Mali-G78 is said to provide a GPU performance boost of up to 25% and also improve on-device ML capabilities by up to 15%. Mali-G78 is also more efficient and a new Fused Multiply-Add (FMA) unit in the execution engine leading to a 30% energy reduction in the unit. The new Asynchronous Top Level feature, together with tiler and fragment dependency tracking improvements, plays a key role in increasing the performance of PC-like games such as Fortnite and PUBG.


There’s also a new “sub-premium” Mali-G68 GPU with many of the same features as Mali-G78 but limited to 6 cores for lower costs and power consumption.

More information can be found on the blog post, product page, and developer’s website.

Ethos-N78 NPU

The Ethos-N78 NPU supports up to 90 different configurations with performance ranging from 1 TOPS to 10 TOPS, and customizable area (inferences/s/mm2), throughput (inferences/s) and average DRAM bandwidth (GB/s). The new NPU also delivers up to twice the peak performance of Ethos N77, 25% better performance efficiency, and 40% greater DRAM bandwidth efficiency.

Visit the product page, and/or read Arm blog post for more details.

Arm Cortex-X1 and Arm Cortex-X Custom Program

Usually, Arm would stop here with their new IP announcements, but this year is a little different, as the company has also introduced the Cortex-X Custom (CXC) program where partners can work in collaboration with Arm engineers to design a CPU closely meeting their specific requirements and go beyond  Cortex-A performance, power, and area (PPA).

Cortex-A78 vs Cortex-X1 Features Comparison

The first CPU part of the CXC program is called the Arm Cortex-X1 CPU. It brings 30 percent peak performance improvements over Arm Cortex-A77 CPU and 22% over the just-announced Cortex-A78 core. The Cortex-X1 also delivers twice the machine learning (ML) performance compared to Cortex-A77.

It’s also possible to make full use of DynamIQ technology by combining one Cortex-X1 core with Cortex-A78 and Cortex-A55 cores to bring a specific boost to single-core performance (+30 percent) at the cost of a larger cluster area due to the more powerful core and larger 8MB L3 cache.

There’s no product page nor developer’s info for Cortex-X1 just yet, so it may take a bit longer to come to market. You can still read more about it on Arm community’s blog post.

Support CNX Software - Donate via PayPal or cryptocurrencies, become a Patron on Patreon, or buy review samples
Notify of
newest most voted