Posts Tagged ‘deep learning’

Arm Research Summit 2017 Streamed Live on September 11-13

September 11th, 2017 2 comments

The Arm Research Summit is “an academic summit to discuss future trends and disruptive technologies across all sectors of computing”, with the second edition of the even taking place now in Cambridge, UK until September 13, 2017.

Click to Enlarge

The Agenda includes various subjects such as architecture and memory, IoT, HPC, computer vision, machine learning, security, servers, biotechnology and others. You can find the full detailed schedule for each day on Arm website, and the good news is that the talks are streamed live in YouTube, so you can follow the talks that interest you from the comfort of your home/office.

Note that you can switch between rooms in the stream above by clicking on <-> icon. Audio volume is a little low…

Thanks to Nobe for the tip.

Intel Introduces Movidius Myriad X Vision Processing Unit with Dedicated Neural Compute Engine

August 29th, 2017 No comments

Intel has just announced the third generation of Movidius Video Processing Units (VPU) with Myriad X VPU, which the company claims is the world’s first SoC shipping with a dedicated Neural Compute Engine for accelerating deep learning inferences at the edge, and giving devices the ability to see, understand and react to their environments in real time.

Movidius Myraid X VPU key features:

  • Neural Compute Engine – Dedicated on-chip accelerator for deep neural networks delivering over 1 trillion operations per second of DNN inferencing performance (based on peak floating-point computational throughput).
  • 16x programmable 128-bit VLIW Vector Processors (SHAVE cores) optimized for computer vision workloads.
  • 16x configurable MIPI Lanes – Connect up to 8 HD resolution RGB cameras for up to 700 million pixels per second of image signal processing throughput.
  • 20x vision hardware accelerators to perform tasks such as optical flow and stereo depth.
  • On-chip Memory – 2.5 MB homogeneous memory with up to 450 GB per second of internal bandwidth
  • Interfaces – PCIe Gen 3, USB 3.1
  • Packages
    • MA2085: No memory in-package; interfaces to external memory
    • MA2485: 4 Gbit LPDDR4 memory in-package

The hardware accelerators allows to offload the neural compute engine, for example, the stereo depth accelerator can simultaneously process 6 camera inputs (3 stereo pairs) each running 720p resolution at 60 Hz frame rate. The slide below also indicates Myriad X to have 10x higher DNN performance compared to Myriad 2 VPU found in Movidius Neural Compute Stick.

Click to Enlarge

The VPU ships with an SDK that contains software development frameworks, tools, drivers and libraries to implement artificial intelligence applications, such as a specialized “FLIC framework with a plug-in approach to developing application pipelines including image processing, computer vision, and deep learning”, and a neural network compiler to port neural networks from Caffe, Tensorflow, and others.

Myriad SDK Architecture

More details can be found on Movidius’ MyriadX product page.

Movidius Neural Compute Stick Shown to Boost Deep Learning Performance by about 3 Times on Raspberry Pi 3 Board

August 9th, 2017 14 comments

Intel recently launched Movidius Neural Compute Stick (MvNCS)for low power USB based deep learning applications such as object recognition, and after some initial confusions, we could confirm the Neural stick could also be used on ARM based platforms such as the Raspberry Pi 3. Kochi Nakamura, who wrote the code for GPU accelerated object recognition on the Raspberry Pi 3 board, got hold of one sample in order to compare the performance between GPU and MvNCS acceleration.

His first attempt was quite confusing as with GoogLeNet, Raspberry Pi 3 + MvNCS achieved an average inference time of about 560ms, against 320 ms while using VideoCore IV GPU in RPi3 board. But then it was discovered that the “” demo would only use one core out of the 12 VLIW 128-bit vector SHAVE processors in Intel’s Movidius Myriad 2 VPU, and after enabling all those 12 cores instead of just one, performance increased to around 108 ms average time per inference. That’s almost 3 times faster compare to using the GPU in RPi3 for this specific demo, and it may vary for other demos / applications.

That’s the description in YouTube:

Comparison of deep learning inference acceleration by Movidius’ Neural Compute Stick (MvNCS) and by Idein’s software which uses Raspberry Pi’s GPU (VideoCore IV) without any extra computing resources.

Movidius’ demo runs GoogLeNet with 16-bit floating point precision.Average inference time is 108ms.
We used MvNC SDK 1.07.07 and their official demo script without any changes. (ncapi/py_examples/stream_infer/
It seems something is wrong with the inference results.
We recompiled graph file with -s12 option to use 12 SHAVE vector processor simultaneously.

Idein’s demo also runs GoogLeNet with 32-bit floating point precision. Average inference time is 320ms.

It’s interesting to note the GPU demo used 32-bit floating point precision, against 16-bit floating point precision on the Neural Compute Stick, although it’s unclear to me how that may affect performance of such algorithms. Intel recommends a USB 3.0 interface for MvNCS, and the Raspberry Pi 3 only comes with a USB 2.0 interface that shares the bandwidth for the USB webcam and the MvNCS, so it’s possible an ARM board with a USB 3.0 interface for the stick, and a separate USB interface for the webcam could perform better. Has anybody tested it? A USB 3.0 interface and hub would also allow to cascade several Neural Compute Sticks.

Intel’s Movidius Neural Compute Stick Brings Low Power Deep Learning & Artificial Intelligence Offline

July 21st, 2017 7 comments

Intel has released several Compute Stick over the years which can be used as tiny Windows or Linux computer connected to the HDMI port of your TV or monitor, but Movidius Neural Computer Stick is a complete different beast, as it’s a deep learning inference kit and self-contained artificial intelligence (A.I.) accelerator that connects to the USB port of computers or laptops.

Intel did not provide the full hardware specifications for the kit, but we do know the following specifications:

  • Vision Processing Unit – Intel Movidius Myriad 2 VPU with 12 VLIW 128-bit vector SHAVE processors @ 600 MHz optimized for machine vision, Configurable hardware accelerators for image and vision processing; 28nm HPC process node; up to 100 gigaflops
  • USB 3.0 type A port
  • Power Consumption – Low power, the SoC has a 1W power profile
  • Dimensions – 72.5mm x 27mm x 14mm

You can enter a trained Caffe, feed-forward Convolutional Neural Network (CNN) into the toolkit, profile it, then compile a tuned version ready for embedded deployment using Intel/Movidius Neural Compute Platform API. Inference occurs in real-time in the stick itself, and no cloud connection is needed. You can even connect multiple Movidius Compute Sticks to the same computer to scale performance.

It can help bring artificial intelligence to drones, robots, security camera, smart speakers, and anything that can leverage deep learning. The video below also shows the USB Compute Stick connected to what looks like a development board, so the target platform does not need to be powerful with most of the hard processing going inside in the stick. It currently does need to be an x86-64 computer running Ubuntu 16.04, so no ARM support.

Movidius Neural Compute Stick is sold for $79 via RS components and Mouser. You’ll find the purchase links, getting started guide and support forums on Movidius Developer site.

Intel DLIA is a PCIe Card Powered by Aria 10 FPGA for Deep Learning Applications

May 29th, 2017 No comments

Intel has just launched their DLIA (Deep Learning Inference Accelerator) PCIe card powered by Intel Aria 10 FPGA, aiming at accelerating CNN (convolutional neural network) workloads such as image recognition and more, and lowering power consumption.

Some of Intel DLIA hardware specifications:

  • FPGA – Intel (previously Altera) Aria 10 FPGA @ 275 MHz delivering up to 1.5 TFLOPS
  • System Memory – 2 banks 4G 64-bit DDR4
  • PCIe – Gen3 x16 host interface; x8 electrical; x16 power & mechanical
  • Form Factor – Full-length, full-height, single wide PCIe card
  • Operating Temperature – 0 to 85 °C
  • TDP – 50-75Watts hence the two cooling fans

The card is supported in CentOS 7.2, and relies on Intel Caffe framework, Math Kernel library for Deep Neural Networks (MKL-DNN), and works with various network topologies (AlexNet, GoogleNet, CaffeNet, LeNet, VGG-16, SqueezeNet…). The FPGA is pre-programmed with Intel Deep Learning Accelerator IP (DLA IP).

Intel DLIA can be used by cloud services providers to filter content, track product photos, for surveillance and security applications for example for face recognition and license plate detection, in the factory to detect defects automatically, and in retail stores to track foot traffic, and monitor inventory.

You’ll find more details including links to get started and the SDK in the product page.

GPU Accelerated Object Recognition on Raspberry Pi 3 & Raspberry Pi Zero

April 30th, 2017 6 comments

You’ve probably already seen one or more object recognition demos, where a system equipped with a camera detects the type of object using deep learning algorithms either locally or in the cloud. It’s for example used in autonomous cars to detect pedestrian, pets, other cars and so on. Kochi Nakamura and his team have developed software based on GoogleNet deep neural network with a a 1000-class image classification model running on Raspberry Pi Zero and Raspberry Pi 3 and leveraging the VideoCore IV GPU found in Broadcom BCM283x processor in order to detect objects faster than with the CPU, more exactly about 3 times faster than using the four Cortex A53 cores in RPi 3.

They just connected a battery, a display, and the official Raspberry Pi camera to the Raspberry Pi boards to be able to recognize various objects and animals.

The first demo is with Raspberry Pi Zero.

and the second demo is on the Raspberry Pi 3 board using a better display.

Source code? Not yet, but he is thinking about it, and when/if it is released it will probably be found on his github account, where there is already py-videocore Python library for GPGPU on Raspberry Pi, which was very likely used in the demos above. They may also have used TensorFlow image recognition tutorials as a starting point, and/or instructions to install Tensorflow on Raspberry Pi.

If you are interested in Deep Learning, there’s a good list of resources with links to research papers, software framework & applications, tutorials, etc… on Github’s .