HiSilicon Hi1620 Server SoC to Features up to 64 Arm “Ares” Cores

A few years ago we covered Hisilicon D02 server board powered by the company’s Hip05 SoC with 16 or 32 Arm Cortex A57 cores. I had not seen any updates since then myself, but HiSilicon has released new “TaiShan” Arm based server SoCs every year, and recently unveiled Hi1620, the world’s first 7nm datacenter Arm processor, featuring 24 to 64 Arm “Ares” cores clocked at up to 3.0 GHz. Ares cores are supposed to greatly improve single thread performance in order to compete with x86 server chips.

Hisilicon Hi1620
Click to Enlarge

HiSilicon Hi1620 processors specifications:

  • CPU – 24 to 64 Ares ARMv8.2 cores clocked at 2.4 – 3.0 GHz
  • Cache – L1: 64KB I-cache, 64KB D-cache; L2: 512KB private per core, L3: 24-64 shared among cores (1MB/core)
  • Memory – 8x DDR4 channels up to 3200 MHz
  • Interconnect – Coherent SMP interface for 2S & 4S, 3 ports up to 240 Gbit/s per port
  • I/Os
    • 40x PCIe Gen 4.0 lanes
    • 2x 100 GbE, RoCEv2/RoCEv1, CCIX
    • 4x USB 3.0
    • 16x SAS 3.0, 2x SATA 3.0
  • Package – 75 x 60 mm, BGA
  • Power – 100 to 200 Watts TDP
  • Process – 7 nm

Anandtech reports vendors are expected Ares cores to achieve Intel Skylake levels of  performance, and Hi1620 is said to be fine-tuned for memory-bound workloads such as CAE/CFD, weather and life-science.. Although an internal Hisilicon D06 development board exists, Huawei did not show any samples at the event either. So it will take some more time before it becomes available, and Arm has not provided details about Ares architecture yet. We should expect more details next year.

As a side note, Arm has made progress in high-performance computing, as there’s now one Arm supercomputer that made it to the top 500 list: Astra, built by HPE, deployed at Sandia National Laboratories, and equipped with 125,328 Cavium ThunderX2 cores delivering an HPL Linpack score of 1.5 petaflops. It’s currently listed at number 204.

Share this:
FacebookTwitterHacker NewsSlashdotRedditLinkedInPinterestFlipboardMeWeLineEmailShare

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK Pi 4C Plus

19 Replies to “HiSilicon Hi1620 Server SoC to Features up to 64 Arm “Ares” Cores”

          1. 16nm is what’s written somewhere 🙂

            I have to admit that I have no idea how to interpret these ‘numbers’ (other than being something used by TSMC’s marketing).

            According to https://en.wikichip.org/wiki/16_nm_lithography_process

            ‘The term “16 nm” is simply a commercial name for a generation of a certain size and its technology, as opposed to gate length or half pitch’ and ‘An enhanced version of TSMC’s 16nm process was introduced in late 2016 called “12nm”‘ and ‘TSMC uses the same BEOL as its 20nm process’.

            Confusing when 20 is almost the same as 16 and then again as 12.

          2. as far as i understand, 16/14/12 are part of finfet family and 20nm isn’t

            but things gets even more complicated when you take into account density
            if my memory serves me right, Intel 14nm is much more dense than TSMC 12nm (to be double-checked)

          3. Those numbers make the most sense when compared against other fabnodes by the same fab — then the ratios/savings are clear (as they’re usually quoted by the fab).

          4. That’s one area where Intel seem to show the most, erm, ingenuity, even more so than in their TDP metrics. ‘We have the best litho process if we count SRAM and flipflop transistors separately’ — Really? If an uarch requires this much SRAM to function adequately are you going to pretend your chip can work SRAM-free?

            The thing that matters transistor-density-wise is entirely in the context of Performance/Power/Area (PPA):

            power/transistor, performance/transistor & transistor/area -> performance/area & power/area

            There’s nothing deeper that matters — at the end of the day you have N mm^2 of silicon, doing M units of work per P joules, period. That also implies that if you have the ultimate litho tech in the universe, but your uarch plain sucks, your gate size is the least of your problems.

          5. And Intel has tried to name 10nm work for how many years now? They have so far only delivered one commercial CPU using it and it’s a turd in more ways than one. Unfortunately, for Intel that is, TSMC has overtaken them and so has Samsung by now. This stuff is really, really hard to make, even for a massive company like Intel.

          6. That was an apparently legal move by Intel to come clean in front of their 10nm fab clients (who suffered massively due to the delays). Under no intents or purposes have Intel shipped viable 10nm products.

  1. After Qualcomm Centriq died, maybe Huawei is the only ARM server player can compete with x86/Intel.
    The interesting thing is CCIX. With CCIX integrated this processor can connect FPGA or ASIC accelerators with low latency and high throughput for heterogeneous computing. Xilinx is also adding CCIX into its FPGA and SoC.

    1. Your post prompted me to read up on CCIX. It does appear like an unified answer to IBM’s CAPI (who used to be a founding member of CCIX consortium, but left?) and nvidia’s NVlink.

      BTW, any particular reason to disregard Cavium/Marvell and Ampere as competent server-chip vendors? : )

  2. this is sooo nice, but when VMWare will release the ARM ESXi? I’m waiting it so impatiently for my lab 🙁

Leave a Reply

Your email address will not be published. Required fields are marked *

Khadas VIM4 SBC
Khadas VIM4 SBC