Movidius Neural Compute Stick Shown to Boost Deep Learning Performance by about 3 Times on Raspberry Pi 3 Board

Intel recently launched Movidius Neural Compute Stick (MvNCS)for low power USB based deep learning applications such as object recognition, and after some initial confusions, we could confirm the Neural stick could also be used on ARM based platforms such as the Raspberry Pi 3. Kochi Nakamura, who wrote the code for GPU accelerated object recognition on the Raspberry Pi 3 board, got hold of one sample in order to compare the performance between GPU and MvNCS acceleration. His first attempt was quite confusing as with GoogLeNet, Raspberry Pi 3 + MvNCS achieved an average inference time of about 560ms, against 320 ms while using VideoCore IV GPU in RPi3 board. But then it was discovered that the “stream_infer.py” demo would only use one core out of the 12 VLIW 128-bit vector SHAVE processors in Intel’s Movidius Myriad 2 VPU, and after enabling all those 12 cores instead of just one, performance increased to around 108 ms average time per inference. […]

Support CNX Software – Donate via PayPal, become a Patron on Patreon, or buy review samples

Intel’s Movidius Neural Compute Stick Brings Low Power Deep Learning & Artificial Intelligence Offline

Intel has released several Compute Stick over the years which can be used as tiny Windows or Linux computer connected to the HDMI port of your TV or monitor, but Movidius Neural Computer Stick is a complete different beast, as it’s a deep learning inference kit and self-contained artificial intelligence (A.I.) accelerator that connects to the USB port of computers or laptops. Intel did not provide the full hardware specifications for the kit, but we do know the following specifications: Vision Processing Unit – Intel Movidius Myriad 2 VPU with 12 VLIW 128-bit vector SHAVE processors @ 600 MHz optimized for machine vision, Configurable hardware accelerators for image and vision processing; 28nm HPC process node; up to 100 gigaflops USB 3.0 type A port Power Consumption – Low power, the SoC has a 1W power profile Dimensions – 72.5mm x 27mm x 14mm You can enter a trained Caffe, feed-forward Convolutional Neural Network (CNN) into the toolkit, profile it, then […]

Support CNX Software – Donate via PayPal, become a Patron on Patreon, or buy review samples

Intel DLIA is a PCIe Card Powered by Aria 10 FPGA for Deep Learning Applications

Intel has just launched their DLIA (Deep Learning Inference Accelerator) PCIe card powered by Intel Aria 10 FPGA, aiming at accelerating CNN (convolutional neural network) workloads such as image recognition and more, and lowering power consumption. Some of Intel DLIA hardware specifications: FPGA – Intel (previously Altera) Aria 10 FPGA @ 275 MHz delivering up to 1.5 TFLOPS System Memory – 2 banks 4G 64-bit DDR4 PCIe – Gen3 x16 host interface; x8 electrical; x16 power & mechanical Form Factor – Full-length, full-height, single wide PCIe card Operating Temperature – 0 to 85 °C TDP – 50-75Watts hence the two cooling fans The card is supported in CentOS 7.2, and relies on Intel Caffe framework, Math Kernel library for Deep Neural Networks (MKL-DNN), and works with various network topologies (AlexNet, GoogleNet, CaffeNet, LeNet, VGG-16, SqueezeNet…). The FPGA is pre-programmed with Intel Deep Learning Accelerator IP (DLA IP). Intel DLIA can be used by cloud services providers to filter content, track […]

Support CNX Software – Donate via PayPal, become a Patron on Patreon, or buy review samples

GPU Accelerated Object Recognition on Raspberry Pi 3 & Raspberry Pi Zero

You’ve probably already seen one or more object recognition demos, where a system equipped with a camera detects the type of object using deep learning algorithms either locally or in the cloud. It’s for example used in autonomous cars to detect pedestrian, pets, other cars and so on. Kochi Nakamura and his team have developed software based on GoogleNet deep neural network with a a 1000-class image classification model running on Raspberry Pi Zero and Raspberry Pi 3 and leveraging the VideoCore IV GPU found in Broadcom BCM283x processor in order to detect objects faster than with the CPU, more exactly about 3 times faster than using the four Cortex A53 cores in RPi 3. They just connected a battery, a display, and the official Raspberry Pi camera to the Raspberry Pi boards to be able to recognize various objects and animals. The first demo is with Raspberry Pi Zero. Raspberry Pi Zero version pic.twitter.com/5ALlnvFEe8 — Koichi Nakamura (@9_ties) April […]

Support CNX Software – Donate via PayPal, become a Patron on Patreon, or buy review samples