Arm Announces Cortex-A78 CPU, Mali-G78 GPU, Ethos-N78 NPU and Custom Cortex-X1 Core

Arm has just announced its 2020 Arm Mobile IP portfolio with no less than five IP blocks including Arm Cortex-A78 CPU, Arm Mali-G78 and G68 GPUs, Arm Ethos-N78 neural processing unit, and the custom Cortex-X program starting with Cortex-X1, the most powerful Arm core to date.

Arm Cortex-A78 CPU

Arm Cortex A78 Block Diagram
Click to Enlarge

Cortex-A78 highlights:

  • Architecture –  Armv8-A (Harvard)
  • Extensions – Armv8.1, Armv8.2, Cryptography, and RAS; Armv8.3 (LDAPR instructions only)
  • ISA support – A64, A32, and T32 (at EL0 only)
  • Microarchitecture
    • Pipeline – Out of order
    • Superscalar
    • Neon / Floating Point Unit
    • Optional cryptography Unit
    • Max number of CPUs in cluster – 4
    • Physical Addressing (PA) – 40-bit
  • Memory system and external interfaces
    • 32KB to 64KB L1 I-Cache / D-Cache
    • 256KB to 512KB L2 Cache
    • Optional 512KB to 4MB L3 Cache
    • ECC and LPAE support
    • Trustzone security

Cortex-A78 vs Cortex-A77 Performance Improvement

Cortex-A78 delivers 20% extra performance compared to Cortex-A77 at the same power budget (one Watt), but peak performance is about 7% faster, and machine learning performance is basically the same. So the real benefit of Cortex-A78 is higher efficiency which should lead to either a more constant performance or longer battery life with Cortex-A78 consuming 50% less than Cortex-A77 at the same performance.

Cortex-A78 vs Cortex-A77

We’ll provide more data and charts below in our comparison with Cortex-X1 processor.

More details about Cortex-A78 can be found on the product page, announcement post, and developer website.

Arm Mali-G78 GPU

Arm Mali-G78 GPU Block Diagram
Click to Enlarge

As usual, Arm will announce accompanying GPU, Display Processor, and NPU with their lastest Arm Cortex-A core, this time with Mali-G78 GPU with the following key features and specifications:

  • Architecture – Second-generation Valhall architecture
  • Number of Cores – 7 to 24 cores
  • API support – OpenGL ES 1.1, 2.0, 3.1, 3.2; Vulkan 1.1, 1.2; OpenCL 1.1, 1.2, 2.0 Full profile
  • AMBA 4 ACE, ACE-LITE, and AXI bus interface
  • Configurable 512KB – 2MB L2 cache
  • 4x/8x/16x MSAA Anti-aliasing
  • Adaptive Scalable Texture Compression (ASTC) – Low Dynamic Range (LDR) and High Dynamic Range (HDR).
  • Arm Frame Buffer Compression (AFBC) v1.3

Compared to Mali-G77, Mali-G78 is said to provide a GPU performance boost of up to 25% and also improve on-device ML capabilities by up to 15%. Mali-G78 is also more efficient and a new Fused Multiply-Add (FMA) unit in the execution engine leading to a 30% energy reduction in the unit. The new Asynchronous Top Level feature, together with tiler and fragment dependency tracking improvements, plays a key role in increasing the performance of PC-like games such as Fortnite and PUBG.

 

There’s also a new “sub-premium” Mali-G68 GPU with many of the same features as Mali-G78 but limited to 6 cores for lower costs and power consumption.

More information can be found on the blog post, product page, and developer’s website.

Ethos-N78 NPU

Ethos N78 Performance

The Ethos-N78 NPU supports up to 90 different configurations with performance ranging from 1 TOPS to 10 TOPS, and customizable area (inferences/s/mm2), throughput (inferences/s) and average DRAM bandwidth (GB/s). The new NPU also delivers up to twice the peak performance of Ethos N77, 25% better performance efficiency, and 40% greater DRAM bandwidth efficiency.

Visit the product page, and/or read Arm blog post for more details.

Arm Cortex-X1 and Arm Cortex-X Custom Program

Usually, Arm would stop here with their new IP announcements, but this year is a little different, as the company has also introduced the Cortex-X Custom (CXC) program where partners can work in collaboration with Arm engineers to design a CPU closely meeting their specific requirements and go beyond  Cortex-A performance, power, and area (PPA).

Cortex-A78 vs Cortex-X1
Cortex-A78 vs Cortex-X1 Features Comparison

The first CPU part of the CXC program is called the Arm Cortex-X1 CPU. It brings 30 percent peak performance improvements over Arm Cortex-A77 CPU and 22% over the just-announced Cortex-A78 core. The Cortex-X1 also delivers twice the machine learning (ML) performance compared to Cortex-A77.

Cortex-A77 vs Cortex-A78 vs Cortex-X1

It’s also possible to make full use of DynamIQ technology by combining one Cortex-X1 core with Cortex-A78 and Cortex-A55 cores to bring a specific boost to single-core performance (+30 percent) at the cost of a larger cluster area due to the more powerful core and larger 8MB L3 cache.

Cortex-X1 DynamIQ

There’s no product page nor developer’s info for Cortex-X1 just yet, so it may take a bit longer to come to market. You can still read more about it on Arm community’s blog post.

Support CNX Software - Donate via PayPal or become a Patron on Patreon
Advertisements
Subscribe
Notify of
guest
18 Comments
oldest
newest most voted
name
name
1 month ago

Marketing at work. Scream 20% higher performance for a given power envelope, but put in the fineprint the fact your comparison was between 2.6 GHZ in 7FF and 3.0 GHz in 5FF. Or celebrate +100% ML performance by merely doubling the cache. Who could have expected such great gains from such a sliver of parameter increase?

Laurent
Laurent
1 month ago

Yeah definitely beware of marketing 🙂 But in the last few years, Anandtech reviews have shown that ARM claims were rather accurate.

The +30% over Cortex-A77 for Cortex-X1 looks much more interesting. I hope we’ll soon get laptops with that chip.

crashoverride
crashoverride
1 month ago

I noticed that too. Its a 13.3% increase in clock frequency which yields an actual 6.7% performance increase at the same clock rate.

Laurent
Laurent
1 month ago

That’s exactly what’s written on an ARM slide: +7% at ISO frequency. And that’s no fine print.

theguyuk
theguyuk
1 month ago

Yet these wwill be multii $100 dollar devices. I wiish arm would work a little with arm media box market to promote standardisation of linux drivers for periferals etc. Also since these devices are mains powerd, with no batttery worries, or lack of heatsink, allow more higher clocking. etc A standard 3 or 4 port pci would allow custom use to.

Dan
Dan
1 month ago

Because Arm does not want to be outside of mobile or highly controllable environments. They want the markets that don’t care about kernels or drivers or peripherals. They’re happy to leave that dying market to Intel. Instead they are focused on disposable devices that don’t require longterm software support or servers that are happy to run LTS kernels for years instead of mainline.

dgp
dgp
1 month ago

>servers that are happy to run LTS kernels for years instead of mainline.

Maybe the lowend server stuff. People doing high performance/high throughput stuff might actually want to be able to use all of the new stuff happening with eBPF etc without having to backport it.

Dan
Dan
1 month ago

RHEL 7 is running a 3.10 kernel and RHEL 8 is on 4.18. ARM is targeting companies who use Redhat style distributions, not Ubuntu.

dgp
dgp
1 month ago

>RHEL 7 is running a 3.10 kernel and RHEL 8 is on 4.18. Redhat kernels are heavily patched with all sorts of backported stuff so those numbers mean very little. >ARM is targeting companies who use Redhat style distributions, not Ubuntu. ARM doesn’t sell hardware. They are selling designs to whatever company thinks they can turn those designs into something that’ll make money. Maybe there are companies making chips that want RHEL because they somehow think they can beat generic x86 machines in the “stack in high, sell it cheap” bog standard hosting space but I doubt it because we… Read more »

theguyuk
theguyuk
1 month ago

So what silicon powers Roku, Amazon Fire tv, nvidia shield tv, Nintendo Switch, iot boards, smart meters, routers, wireless ear buds, smart cars, digital video or camera , TV,’s and Smart watches

dgp
dgp
1 month ago

>So what silicon powers

Whatever is cheap to license and can be produced on whatever second hand fab has extra capacity that week.
Those products don’t use one core over another because of some quasi-religious love of how it runs poorly written C code that came out of some ancient version of GCC.

theguyuk
theguyuk
1 month ago

The point is arm is more than a one trick smartphone pony.

dgp
dgp
1 month ago

>The point is arm is more than a one trick smartphone pony.

Well I would hope so considering they are a fabless ip company. But you know licensing their high end cores for smartphones is probably a good chunk of their income these days.

Dan
Dan
1 month ago

Those all count as “highly controllable” environments. Random people are not trying to run random PCI-E cards they found on eBay for $10 on those. They are locked down in firmware, kernel, and supported accessory interfaces.

Philipp Blum
Philipp Blum
1 month ago

This whole industry is broken. Just look at it. We have new chips every year, often without any software support or if, with a linux kernel which is already at least 1 year old. Why should they care when they can sell a new chip in the following year? The problem is that they get rewarded for this behavior, but they should get punished. It should be rewarded to be sustainable.

dgp
dgp
1 month ago

Would have been nice to see this with one of the newer v8 versions with the new security extensions and so forth.

Aurimas
Aurimas
1 month ago

Why this X1 looks like candidate for Apple new ARM laptop?

Willy
Willy
1 month ago

I don’t think so yet considering the lower IPC (5 vs 7 or so). However I suspect that ARM wants to send a signal to such makers and gauge their interest. If there is some, they might issue an X2 or so with different performance levels focusing on more aggressive and expensive optimizations (maybe X3/X5/X7 to remind people of atoms or core iX).

Advertisements