MIPS Based TritonAI 64 AI IP Platform to Enable Inferencing & Training at the Edge

After announcing their first MIPS Open release a few weeks ago, Wave Computing is back in the news with the announcement of TritonAI 64, an artificial intelligence IP platform combining MIPS 64-bit + SIMD open instruction set architecture with the company’s WaveTensor subsystem for the execution of convolutional neural network (CNN) algorithms, and WaveFlow flexible, scalable fabric for more complex AI algorithms.

TritonAI 64 can scale up to 8 TOPS/Watt, over 10 TOPS/mm2 using a standard 7nm process node, and eventually would allow both inference and training at the edge.

TritonAI-64
Click to Enlarge

The platform supports 1 to 6 cores with MIPS64r6 ISA boasting the following features:

  • 128-bit SIMD/FPU
  • 8/16/32/int, 32/64 FP datatype support
  • Virtualization extensions
  • Superscalar 9-stage pipeline w/SMT
  • Caches (32KB-64KB), DSPRAM (0-64KB)
  • Advanced branch predict and MMU
  • Integrated L2 cache (0-8MB, opt ECC)
  • Power management (F/V gating, per CPU)
  • Interrupt control with virtualization
  • 256b native AXI4 or ACE interface
TritonAI 64 Block Diagram
Click to Enlarge

Here’s the description provided by the company for their WaveTensor and WaveFlow AI blocks:

  • WaveTensor Technology – The WaveTensor subsystem can scale up to a PetaOP of 8-bit integer operations on a single core instantiation by combining extensible slices of 4×4 or 8×8 kernel matrix multiplier engines for the highly efficient execution of today’s key Convolutional Neural Network (CNN) algorithms. The CNN execution performance can scale up to 8 TOPS/watt and over 10 TOPS/mm2 in industry standard 7nm process nodes with libraries using typical voltage and processes.
WaveTensor
WaveTensor offers configurable 4×4 or 8×8 MAC tiles
  • WaveFlow Technology – Wave Computing’s highly flexible, linearly scalable fabric is adaptable for any number of complex AI algorithms, as well as conventional signal processing and vision algorithms. The WaveFlow subsystem features low latency, single batch size AI network execution, and reconfigurability to address concurrent AI network execution. This patented WaveFlow architecture also supports algorithm execution without intervention or support from the MIPS subsystem.
WaveFlow
WaveFlow scales to up to 1K tiles with 16 CPUs and 8 MACs (8int) each

Basically, I understand that WaveTensor is what you’d use for the highest efficiency, and leveraging existing AI frameworks like TensorFlow Lite, while WaveFlow provides more flexibility at, I suspect, the cost of higher power consumption.

One the software side, Wave Computing offers the TritonAI Application Programming Kit (APK) with the following components:

  • GCC Binutil tools and IDE
  • Debian Linux OS support and updates
  • WaveRT API abstracts framework calls
  • WaveRT Optimized AI libraries for
  • CPU/SIMD/WaveFlow/WaveTensor
  • TensorFlow-lite build support and updates
  • TensorFlow build for edge training roadmap
TritonAI Application Programming Kit
Click to Enlarge

Wave Computing TritonAI 64 is now available for licensing. More details may be found on the product page.

Via LinuxGizmos

Support CNX Software - Donate via PayPal or become a Patron on Patreon

Leave a Reply

avatar
  Subscribe  
Notify of