Updated x86/x64 Software Optimization Manual Reveals More Intel Tremont Details

We first found out about Tremont microarchitecture in April 2018 in some Intel documents and Linux mainline source showing it was likely meant to be Goldmont Plus successor. Last year, Intel formally announced Tremont architecture providing some of the details with a block diagram and key features.

But this morning, we were informed by email of a new revision of the x86/x64 Software Optimization Manual (PDF) with even further details about Tremont architecture.

Tremont Microarchitecture Processor Pipeline
Click to Enlarge

If you want to know all the details, jump to 4.1 Tremont Architecture section of the document, but here are some of the highlights / improvements over Golmond Plus microarchitecture:

  • Enhanced branch prediction unit.
    • Increased capacity with improved path-based conditional and indirect prediction.
    • New committed Return Stack Buffer.
  • Clustered 6-wide out-of-order front-end fetch and decode pipeline.
    • Banked ICache with dual 16B reads.
    • Two 3-wide decode clusters enabling up to 6 instructions per cycle.
  • Deeper back-end out-of-order windows.
  • Dedicated integer and vector integer/floating-point store data ports.
  • 33% increase in size of the L1 data cache from 24KB to 32KB
  • Larger 2nd level TLB:
    • 512 4K entries to 1K 4K entries
    • 32 2M/4M entries to 64 2M/4M entries
  • L2 cache size from 1MB to 4.5MB depending on SoC (Up to 4.5MB in Snow Ridge and up to 1.5 MB in Lakefield)
  • Larger load and store buffers.
  • Dual generic load and store execution pipes capable of 2 loads, 2 stores, or 1 load and 1 store per cycle.
  • New and improved cryptography.
    • New Galois-field instructions (GFNI).
    • Dual AES units.
    • Enhanced SHA-NI implementation.
    • Faster PCLMULQDQ.
  • Support for user-level low-power and low-latency spin-loop instructions UMWAIT/UMONITOR and TPAUSE

Tremont Cache

The document mentions both Snow Ridge and Lakefield processors. Both are part of the Atom family, and we previously covered Intel Atom P5900 “Snow Ridge” processor for networking applications including 5G base stations. Somehow, the processor ships with up to 27MB cache, so now I understand the 4.5MB cache is per cluster of four Tremont cores, which explains why for example, Atom P5921B 8-core processor comes with 9MB cache, and Atom P5962B 24-core processor with 27MB cache.

It’s not the first time we hear about Lakefield hybrid processor either, It combines four Tremont cores with one high-performance Sunny Cove core in a similar fashion to Arm’s big.LITTLE or DynamIQ technologies. The processor will also come with  64 EU Intel Gen11 graphics, Gen11.5 display core supporting 5k60 and 4k120 video output, and a media core capable of handling 4K60 and 8K30 video decoding.

Intel Atom processors are normally supposed to be low-cost SoCs, but neither of the Snow Ridge nor Lakefield processors mentioned above will be found in entry-level hardware. We may have to wait a little longer to learn about entry-level Tremont processors, which should be Elkhart lake processors succeeding Gemini Lake family.

Via InstLatX64 and thanks to “NewsTips” for the tip.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK Pi 4C Plus
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
0 Comments
Khadas VIM4 SBC