Arm hast just unveiled Cortex-R82 64-bit real-time processor that is Linux-capable and designed for “next-generation enterprise and computational storage solutions”.
To clearly understand what we’re dealing, let’s first find out what computational storage is via SNIA website:
Computational Storage is defined as architectures that provide Computational Storage Services coupled to storage, offloading host processing, or reducing data movement. A Computational Storage Service (CSS) is a data service or information service that performs computation on data where the service and data are associated with a storage device.
So If I understand correctly, so far all we asked from SSD’s, hard drives, and other storage, was to move and store data as fast as possible to a host device capable any analyzing the data. But computational storage brings this to the storage device itself, so we may soon have Smart Hard Drives that run Linux and do some of this processing on the device itself.
Key features and specifications:
- Architecture – Armv8-R AArch64
- Compliant with Armv8.4-A extensions
- Instruction Set – A64 instruction set
- Up to eight cores with in-cluster hardware coherency
- Microarchitecture – Eight-stage, in-order, superscalar pipeline with direct and indirect branch prediction.
- Cache controllers
- Separate L1 data cache and L1 instruction cache private to each core.
- An optional, shared (between all cores), and unified (instructions and data) L2 cache.
- Partial L2 cache power-down support.
- Tightly-Coupled Memories (TCM) – 2x optional TCMs private to each core: an ITCM for instructions and literal pool data and a DTCM for data.
- Cache protection
- Reliability, Availability, and Serviceability (RAS) extension.
- Optional ECC, Single Error Correct Double Error
- Detect (SECDED) or Double Error Detect (DED) protection for all of the instantiated cache tag and data RAMs, the TCM RAMs, and the TLB RAMs.
- Interrupt interface – Standard interrupt, IRQ, FIQ, inputs are provided together with an interface to an external GICv3.2-compliant Generic Interrupt Controller (GIC)
- Memory Protection Unit (MPU)
- 2x optional and programmable MPUs controlled from EL1 and EL2 respectively.
- Configure attributes for up to 32 regions per MPU. Regions cannot overlap.
- Memory Management Unit (MMU) – Optional EL1 MMU for fine-grained memory system control through virtual-to-physical address mappings and memory attributes held in translation tables.
- Floating Point Unit (FPU) and Advanced SIMD (Neon)
- Optional FPU implementing the Arm Vector and Floating-Point architecture VFPv4 with 32 x 128-bit registers, compliant with IEEE754. Supports Advanced SIMD, half-precision, single-precision, double-precision
- Master bus -Shared Main Master (MM) port implemented as AXI5 256-bit providing access for instructions, data, and peripherals. This interface can optionally be a 256-bit CHI-E interface.
- Slave bus – 128-bit shared AXI-S port used for two purposes:
- As an LLRAM Accelerator Coherency Port enabling I/O coherent external access to the LLRAM port.
- As a TCM slave enabling external agents to access the TCMs within the cores.
- Low Latency RAM Port (LLRAM) – Optional AXI5 256-bit shared LLRAM port providing low-latency access for instructions and data. The port is designed to connect to local memory. This local memory provides many of the benefits of TCM and in addition, can be slower and lower power and also easily shared between the up-to-eight processor cores.
- Shared Peripheral Port (SPP) -Optional AXI5 64-bit SPP for providing access to peripherals.
- Low Latency Peripheral Port (LLPP) – An optional per core dedicated 32-bit AXI5 port to integrate latency-sensitive peripherals tightly with a specific core within the processor.
- Main Accelerator Coherency Port (MACP) – ACE5-Lite 128-bit shared slave MACP for external access to MM address ranges. MACP enables I/O coherency for external agents with the per-core L1 data cache and shared L2 cache.
- Debug – Debug Access Port is provided. Its functionality can be extended using Coresight Debug and Trace.
- Trace – Cortex-R82 includes one CoreSight Embedded Trace Module per core.
The previous real-time core from Arm was Cortex-R8 32-bit core, and Cortex-R82 is the fist 64-bit real-time core from the company. We seldom read about those cores since they are embedded into storage controllers, cellular modems, automotive chips, etc..
While Cortex-R8 focuses on real-time workload, Cortex-R82 is much more powerful and capable of handling both real-time and application-level workloads and can address up to 1TB of memory.
Cortex-A82 cores can be reconfigured on the fly (by software) to perform real-time or compute tasks as needed as shown in the illustration below.
It’s already possible to create something similar with Cortex-A cores running Linux, and Cortex-R5/R8 cores for real-time processing, but using Cortex-R82 cores simplifies the overall system architecture.
Cortex-A82 will find its use in IoT, ML, and edge computing, especially in storage applications for database acceleration, increased security and privacy, and real-time video transcoding.
Arm also noted computational storage is important for transportation with airplanes now generating terabytes of data a day that requires to be offloaded for analysis. Cortex-A82 can enable real-time analysis of this data on the drive, leading to faster turnarounds (30 minutes).
License plate recognition is another use case that could benefit from the new processor with, for instance, a system collecting vehicle registration plate data during the day (using real-time storage processing), and process data for billing and machine learning (using computational power) at night.