Kendryte K510 tri-core RISC-V AI processor deliver up to 3 TOPS

Kendryte K510 Block Diagram

Kendryte K510 is a 64-bit tri-core RISC-V processor clocked at up to 800 MHz with AI accelerators that succeed the 400 MHz Kendryte K210 dual-core RISC-V AI processor released a few years ago first in Kendryte KD233 board, and then boards like Maxduino or Grove AI HAT conveniently programmable with Arduino or Micropython. Canaan formally announced the processor yesterday at the 2021 World Artificial Intelligence Conference claiming K510 had three times the performance of K210 making it suitable for UAV high-definition aerial photography, high-definition panoramic video conferences, robotics, STEAM education, driver assistance scenarios, and industrial and professional cameras. The press release did not have much information, but multiple sources provided additional details to CNX Software, so we have Kendryte K510 specifications: Processor – 2x 64-bit RISC-V processor @ 800 MHz, and 1x 64-bit RISC-V core @ 800 MHz with DSP extension AI subsystem with 3 TOPS in total KPU: General […]

AIfES for Arduino high-efficiency AI framework for microcontrollers becomes open source

AlFes for Arduino

AIfES (AI for Embedded Systems) is a standalone, high-efficiency, AI framework, which allows the Fraunhofer Institute for Microelectronic Circuits and Systems, or Fraunhofer IMS for short, to train and run machine learning algorithms on resource-constrained microcontrollers. So far the framework was closed-source and only used internally by Fraunhofer IMS, but following a collaboration with Arduino, AIfES for Arduino is now open-source and free to use for non-commercial projects. The framework has been optimized to allow 8-bit microcontrollers such as the one found in Arduino Uno to implement an Artificial Neural Network (ANN) that can be trained in moderate time. That means offline inference and training on small self-learning battery-powered devices is possible with AIfES without relying on the cloud or other devices. The library implements Feedforward Neural Networks (FNN) that can be freely parameterized, trained, modified, or reloaded at runtime. Programmed in C language, AIfES uses only standard libraries based […]

Vecow ABP-3000 AI Edge gateway combines Hailo-8 AI accelerator with Intel Whiskey Lake processor

Vecow ABP-3000-AI Hailo-8 accelerator

We first discovered Hailo-8 AI accelerator with claims of up to 26 TOPS performance and 3TOPS/W efficiency in October 2020. Since then, we’ve seen several integrate an Hailo-8 M.2 module into their design including EdgeTuring Edge AI camera and Vecow VAC-1000 gateway with a 24-core Foxconn processor. Vecow has now integrated the Hailo-8 AI accelerator into another gateway, but instead of relying on an Arm processor, the Vecow ABP-3000 AI computing system features an 8th generation Intel Core Whiskey Lake processor. Vecow ABP-3000 specifications: SoC – Intel Core i7-8665UE or i3-8145UE quad-core Whiskey Lake processor with Intel UHD Graphics 620; 15W TDP System Memory – 2x DDR4 2400MHz SO-DIMM, up to 64GB Storage – 1x M.2 Key B Socket (PCIe x2/SATA) AI Accelerator – Hailo-8 AI Processor, up to 26 TOPS with TensorFlow, ONNX frameworks support System IO chip – IT8786E Video Output – 2x DisplayPort up to 4096 x […]

Jevois Pro small AI camera with Amlogic A311D SoC offers up to 13 TOPS (Crowdfunding)

JeVois Pro

Jevois-A33 smart camera was a tiny Linux camera with Allwinner A33 processor designed for computer vision applications and announced in 2016. I had the opportunity to review the computer vision camera the following year, and it was fun to use to learn about computer vision with many examples, but since it relied on the CPU for processing, it would not have been suitable for all projects due to the lag, as for example, object detection took 500ms and Yolo V3 around 3 seconds per inference. But time has passed, and great progress has been made in the computer vision and AI fields with the tasks now usually handled by a built-in NPU, or an AI accelerator card. So JeVois Pro deep learning camera has just been launched with an Amlogic A311D processor featuring a 5 TOPS NPU, and support for up to 13 TOPS via a Myriad X or Google […]

Coral Dev board news – NXP critical firmware update, manufacturing demo, and WebCoral in Chrome

WebCoral Coral USB Accelerator Chrome

Google Coral is a family of development boards, modules, M.2/mPCIe cards, and USB sticks with support with local AI, aka on-device or offline AI, based on Google Edge TPU. The company has just published some updates with one important firmware update, a manufacturing demo for worker safety & visual inspection, and the ability to use the Coral USB accelerator in Chrome. Coral firmware update prevents board’s excessive wear and tear If you own the original Coral development board or system-on-module based on NXP i.MX 8M processor, you may want to update your Mendel Linux installation with:

The update includes a patch from NXP with a critical fix to part of the SoC power configuration. Without this patch, the SoC might overstress and the lifetime of your board could be reduced. Note this only affects NXP-based boards, so other Coral products such as Coral Dev Mini powered by Mediatek MT8167S […]

NVIDIA TAO Transfer Learning Toolkit (TLT) 3.0 released with pre-trained models

NVIDIA TAO Transfer Learning Toolkit

NVIDIA first introduced the TAO (Train, Adapt and Optimize) framework to eases AI model training on NVIDIA GPU’s as well as NVIDIA Jetson embedded platforms last April during GTC 2021. The company has now announced the release of the third version of the TAO Transfer Learning Toolkit (TLT 3.0) together with some new pre-trained models at CVPR 2021 (2021 Conference on Computer Vision and Pattern Recognition). The newly released pre-trained models are applicable to computer vision and conversational AI, and NVIDIA claims the release provides a set of powerful productivity features that boost AI development by up to 10 times. Highlights of TAO Transfer Learning Toolkit 3.0 Various computer vision pre-trained models for Computer vision: Body Pose estimation model that supports real-time inference on edge with 9x faster inference performance than the OpenPose model. Emotion recognition Facial landmark License plate detection and recognition Heart rate estimation Gesture recognition Gaze estimation […]

Benchmarking TinyML with MLPerf Tiny Inference Benchmark

MLPerf Tiny Inference Benchmark

As machine learning moves to microcontrollers, something referred to as TinyML, new tools are needed to compare different solutions. We’ve previously posted some Tensorflow Lite for Microcontroller benchmarks (for single board computers), but a benchmarking tool specifically designed for AI inference on resources-constrained embedded systems could prove to be useful for consistent results and cover a wider range of use cases. That’s exactly what MLCommons, an open engineering consortium, has done with MLPerf Tiny Inference benchmarks designed to measure how quickly a trained neural network can process new data for tiny, low-power devices, and it also includes an optional power measurement option. MLPerf Tiny v0.5, the first inference benchmark suite designed for embedded systems from the organization, consists of four benchmarks: Keyword Spotting – Small vocabulary keyword spotting using DS-CNN model. Typically used in smart earbuds and virtual assistants. Visual Wake Words – Binary image classification using MobileNet. In-home security […]

Software-based neural video decoder leverages AI accelerator on Snapdragon 888

Software video decoding ai accelerator

Sometimes hardware blocks got to work on tasks they were not initially designed to handle. For example, AI inference used to be mostly offloaded to the GPU before neural network accelerators became more common in SoC’s. Qualcomm AI Research has now showcased a software-based neural video decoder that leverages both the CPU and AI engine in Snapdragon 888 processor to decode a 1280×704 HD video at over 30 fps without any help from the video decoding unit. The neural video decoder is still a work in progress as it only supports intra frame decoding, and inter frame decoding is being worked on. That means each frame is currently decoded independently without taking into account small changes between frames as all other video codecs do. The CPU handles parallel entropy decoding while the decoder network is accelerated on the 6th generation Qualcomm AI Engine found in Snapdragon 888 mobile platform. This […]