Posts Tagged ‘deep learning’

JeVois Smart Machine Vision Camera Review – Part 1: Developer / Robotics Kit Unboxing

October 24th, 2017 No comments

JeVois-A33 computer vision camera was unveiled at the end of last year through a Kickstarter campaign. Powered by an Allwinner A33 quad core Cortex A7 processor, and a 1.3MP camera sensor, the system could detect motion, track faces and eyes, detect & decode ArUco makers & QR codes, follow lines for autonomous cars, etc.. thanks to JeVois framework.

Most rewards from KickStarter shipped in April of this year, so it’s quite possible some of the regular readers of this blog are already familiar the camera. But the developer (Laurent Itti) re-contacted me recently, explaining they add improves the software with Python support, and new features such as the capability of running deep neural networks directly on the processor inside the smart camera. He also wanted to send a review sample, which I received today, but I got a bit more than I expected, so I’ll start the review with an unboxing of they call the “Developer / Robotics Kit”.

I got the kit in a white package, so I’ll skip the photo, and checking out directly the content.

Click to Enlarge

I was really expecting to receive a tiny camera, and not much else. So my first reaction was: “what!?” 🙂

You’ll find 5 mini USB cables inside (from top left to bottom middle):

Power Bank Info

  • USB to micro serial adapter cable, 1m long, to access the serial console in the camera when running in debug mode, or while troubleshooting Arduino code
  • mini USB + micro USB splitter cable, 15cm long, to power both the camera and Arduino board from the power bank
  • mini USB Y cable, 80cm long, to power the board via two USB 2.0 ports or to one USB 3.0 port on your host computer
  • mini USB cable, 23cm long, to power the camera from a USB port or power bank.
  • mini USB cable, 75cm long, to connect the camera to one USB 3.0 port or power bank.

The kit also includes an 8GB micro SD card pre-loaded with JeVois software, an SD adapter, a micro SD card reader, a 5V USB tester compatible with QuickCharge 2.0 to monitor the power consumption of the camera with your chosen algorithm, a 2,600 mAh power bank (large enough to power the camera for several hours), an Arduino compatible Pro mini board based on Microchip Atmel Atmega 32U4 MCU, and a business card providing useful information such as a link to a Quick Start Guide.

Oh… I almost forgot. Can you see the “fan” in the middle of photo above? That’s the actual JeVois-A33 camera. I knew it was small, but once you put it into your hands, you realize how tiny it is. The cable on the left of the camera is a micro serial cable to connect to an MCU board.

Click to Enlarge

The back of the camera features all the ports and connectors with a micro SD slot, a mini USB port, the micro serial port connector (which looks like a battery connector), and a dual color LED on left of the micro serial connector that indicates power and camera status.

Click to Enlarge

The bottom reveals an opening to cool down AXP223 PMIC.

Click to Enlarge

If you’re interested in the exact developer/robotics kit I’ve received, you can purchase it for $99.99 on JeVois, Amazon, or RobotShop (with locations in US, Canada, Japan, and France). But if you just want the camera without all cable and accessories, $49.99 will do.

Google’s Teachable Machine is a Simple and Fun Way to Understand How Machine Learning Works

October 9th, 2017 4 comments

Artificial intelligence, machine learning, deep learning, neural networks… are all words we hear more and more today, as machines get the ability to recognize objects, answer voice requests / commands, and so on. But many people may not know at all the basics of how machine learning works, and with that in mind, Google launched Teachable Machine website to let people experiment and understand the basics behind machine learning without having to install an SDK or even code.

So I quickly tried it with Google Chrome, as it did not seem to work with Mozilla Firefox. It’s best to have audio on, as a voice explains how to use it.

Basically you connect your webcam, authorize Chrome too use it, and you should see the image in the input section on the left. After you’re being to train the machine in the learning section in the middle with three difference classes. You’ll be asked to wave your hand and keep pressing on the “Train Green” button until you have at least 100 examples. At this stage, the machine will always detect the green class since it’s all that it knows. Then you can train the Purple class by staying still, and again make sure you have at least 100 examples before you release the button. Now the machine should be able to detect when you stay still or move with a varying percentage of confidence. The output section will just show some animated GIFs, or play sound or words depending on what it detects.  It can learn actions (still, wave hands, clap hands) and object detections. My webcam is pretty bad, but if you have a good image, you should be able to also detect feelings like happiness, sadness, anger, anxiousness, etc… Give it a try it’s fun.

The Teacheable Machine has been built with a new open source hardware-accelerated JavaScript library called deeplearn.js,and Google released the source code for the website too.

NVIDIA Unveils Open Source Hardware NVDLA Deep Learning Accelerator

October 4th, 2017 2 comments

NVIDIA is not exactly known for their commitment to open source projects, but to be fair things have improved since Linus Torvalds gave them the finger a few years ago, although they don’t seem to help much with Nouveau drivers, I’ve usually read positive feedback for Linux for their Nvidia Jetson boards.

So this morning I was quite surprised to read the company had launched NVDLA (NVIDIA Deep Learning Accelerator), “free and open architecture that promotes a standard way to design deep learning inference accelerators”

Comparison of two possible NVDLA systems – Click to Enlarge

The project is based on Xavier hardware architecture designed for automotive products, is scalable from small to large systems, and is said to be a complete solution with Verilog and C-model for the chip, Linux drivers, test suites, kernel- and user-mode software, and software development tools all available on Github’s NVDLA account. The project is not released under a standard open source license like MIT, BSD or GPL, but instead NVIDIA’s own Open NVDLA license.

This an on-going project, and NVIDIA has a roadmap until H1 2018, at which point we should get FPGA support for accelerating software development, as well as support for TensorRT and other supported frameworks.

Via Phoronix

Imagination Announces PowerVR Series2NX Neural Network Accelerator (NNA), and PowerVR Series9XE and 9XM GPUs

September 21st, 2017 3 comments

Imagination Technologies has just made two announcements: one for their PowerVR Series2NX neural network accelerator, and the other for the new high-end GPU families: PowerVR Series9XE and 9XM.

PowerVR Series2NX neural network accelerator

Click to Enlarge

The companies claims 2NX can deliver twice the performance and half the bandwidth of nearest competitor, and it’s the first dedicated hardware solution with flexible bit-depth support from 16-bit down to 4-bit.

Key benefits of their solution (based on market data available in August 2017 from a variety of sources) include:

  • Highest inference/mW IP cores to deliver the lowest power consumption
  • Highest inference/mm2 IP cores to enable the most cost-effective solutions
  • Lowest bandwidth solution with support for fully flexible bit depth for weights and data including low bandwidth modes down to 4-bit
  • 2048 MACs/cycle in a single core, with the ability to go to higher levels with multi core

The PowerVR 2NX NNA is expected to be found in smartphone and other mobile devices leveraging Tensorflow Lite and API for Android, as well as Caffe2Go framework, smart surveillance cameras, assisted and autonomous driving solutions, and home entertainment with TVs and set-top boxes using artificial intelligence to adapt preferences to certain users. NNA will find their ways in more and more SoC as shown in the diagram below by Imagination showing the evolution of SoCs over the years, and this has already started as we’ve seen with Huawei Kirin 970 mobile SoC featuring its own neural processing unit (likely not 2NX though).

Click to Enlarge

PowerVR 2NX development resources include mapping and tuning tools, sample networks, evaluation tools and documentation leveraging industry standard machine learning frameworks such as Caffe and Tensorflow. The Imagination DNN (Deep Neural Network) API, working across multiple SoC configuration, should ease transition between CPU, GPU and NNA.

PowerVR 2NX NNA is available for licensing now which should mean products with the solution possibly coming sometimes in 2018. Some more details about 2NX can be found in a blog post and the product page.

PowerVR Series9XE and 9XM GPUs

Click to Enlarge

The Series9XE GPU family is an up to the previous generation Series8XE family with the same fill-rate density, but improved application performance of up to 20%, with the GPU expected to be used in cost-sensitive products such as digital TVs, set-top boxes, streaming sticks/dongles, and entry-level to mid-range mobiles and tablets.

The Series9XM family improve performance by up to 50% over the Series8XEP family with increased compute density, and should be found in premium set-top boxes, mid-range smartphones, tablets and automotive ADAS applications.

Both families benefit from improvements in the memory subsystem, reducing bandwidth by as much as 25%, include a new MMU, standard support for 10-bit YUV, and are suitable for 4K output/displays.

Some of the key benefits of the new Series9XE/9XM family include:

  • Performance/mm2
    • 9XE GPUs’ improved gaming performance while maintaining the same fillrate density compared to the previous generation
    • 9XM GPUs’ several new and enhanced architectural elements enable up to 70% better performance density than the competition (as of August 2017), and up to 50% better than the previous 8XEP generation
  • Bandwidth savings of up to 25% over the previous generation GPUs through architectural enhancements including parameter compression and tile grouping
  • Memory system improvements: 36-bit addressing for improved system integration, improved burst sizes for efficient memory accesses, and enhanced compression capabilities
  • Low power consumption thanks to  Imagination’s Tile Based Deferred Rendering (TBDR) technology
  • Support for hardware virtualization and Imagination’s OmniShield multi-domain security, enabling customers to build systems in which applications and operating systems can run independently and reliably on the same platform
  • Support for Khronos graphics APIs: OpenGL ES 3.2, and Vulkan 1.0
  • Support for advanced compute and vision APIs such as RenderScript, OpenVX 1.1 and OpenCL 1.2 EP
  • Optional support for PVRIC3 PowerVR lossless image compression technology

The company also explains the Series9XE/9XM GPU are ideal for use with the new PowerVR 2NX Neural Network Accelerator, which means NNA’s will not only be found in premium devices, but also in entry level and mid range products.

The IP is available for licensing now with four Series9XE GPU IP cores:

  • 1 PPC with 16 F32 FLOPS/clock(GE9000)
  • 2 PPC with 16 F32 FLOPS/clock (GE9100)
  • 4 PPC with 32 F32 FLOPS/clock (GE9210)
  • 8 PPC with 64 F32 FLOPS/clock (GE9420)

and three Series9XM GPU IP cores:

  • 4 PPC with 64 FP32 FLOPS/clock (GM9220)
  • 4 PPC with 128 FP32 FLOPS/clock (GM9240)
  • 8 PPC with 128 FP32 FLOPS/clock (GM9240)

Visit the product page for more details about the new PowerVR GPU families.

Arm Research Summit 2017 Streamed Live on September 11-13

September 11th, 2017 2 comments

The Arm Research Summit is “an academic summit to discuss future trends and disruptive technologies across all sectors of computing”, with the second edition of the even taking place now in Cambridge, UK until September 13, 2017.

Click to Enlarge

The Agenda includes various subjects such as architecture and memory, IoT, HPC, computer vision, machine learning, security, servers, biotechnology and others. You can find the full detailed schedule for each day on Arm website, and the good news is that the talks are streamed live in YouTube, so you can follow the talks that interest you from the comfort of your home/office.

Note that you can switch between rooms in the stream above by clicking on <-> icon. Audio volume is a little low…

Thanks to Nobe for the tip.

Intel Introduces Movidius Myriad X Vision Processing Unit with Dedicated Neural Compute Engine

August 29th, 2017 No comments

Intel has just announced the third generation of Movidius Video Processing Units (VPU) with Myriad X VPU, which the company claims is the world’s first SoC shipping with a dedicated Neural Compute Engine for accelerating deep learning inferences at the edge, and giving devices the ability to see, understand and react to their environments in real time.

Movidius Myraid X VPU key features:

  • Neural Compute Engine – Dedicated on-chip accelerator for deep neural networks delivering over 1 trillion operations per second of DNN inferencing performance (based on peak floating-point computational throughput).
  • 16x programmable 128-bit VLIW Vector Processors (SHAVE cores) optimized for computer vision workloads.
  • 16x configurable MIPI Lanes – Connect up to 8 HD resolution RGB cameras for up to 700 million pixels per second of image signal processing throughput.
  • 20x vision hardware accelerators to perform tasks such as optical flow and stereo depth.
  • On-chip Memory – 2.5 MB homogeneous memory with up to 450 GB per second of internal bandwidth
  • Interfaces – PCIe Gen 3, USB 3.1
  • Packages
    • MA2085: No memory in-package; interfaces to external memory
    • MA2485: 4 Gbit LPDDR4 memory in-package

The hardware accelerators allows to offload the neural compute engine, for example, the stereo depth accelerator can simultaneously process 6 camera inputs (3 stereo pairs) each running 720p resolution at 60 Hz frame rate. The slide below also indicates Myriad X to have 10x higher DNN performance compared to Myriad 2 VPU found in Movidius Neural Compute Stick.

Click to Enlarge

The VPU ships with an SDK that contains software development frameworks, tools, drivers and libraries to implement artificial intelligence applications, such as a specialized “FLIC framework with a plug-in approach to developing application pipelines including image processing, computer vision, and deep learning”, and a neural network compiler to port neural networks from Caffe, Tensorflow, and others.

Myriad SDK Architecture

More details can be found on Movidius’ MyriadX product page.

Movidius Neural Compute Stick Shown to Boost Deep Learning Performance by about 3 Times on Raspberry Pi 3 Board

August 9th, 2017 14 comments

Intel recently launched Movidius Neural Compute Stick (MvNCS)for low power USB based deep learning applications such as object recognition, and after some initial confusions, we could confirm the Neural stick could also be used on ARM based platforms such as the Raspberry Pi 3. Kochi Nakamura, who wrote the code for GPU accelerated object recognition on the Raspberry Pi 3 board, got hold of one sample in order to compare the performance between GPU and MvNCS acceleration.

His first attempt was quite confusing as with GoogLeNet, Raspberry Pi 3 + MvNCS achieved an average inference time of about 560ms, against 320 ms while using VideoCore IV GPU in RPi3 board. But then it was discovered that the “” demo would only use one core out of the 12 VLIW 128-bit vector SHAVE processors in Intel’s Movidius Myriad 2 VPU, and after enabling all those 12 cores instead of just one, performance increased to around 108 ms average time per inference. That’s almost 3 times faster compare to using the GPU in RPi3 for this specific demo, and it may vary for other demos / applications.

That’s the description in YouTube:

Comparison of deep learning inference acceleration by Movidius’ Neural Compute Stick (MvNCS) and by Idein’s software which uses Raspberry Pi’s GPU (VideoCore IV) without any extra computing resources.

Movidius’ demo runs GoogLeNet with 16-bit floating point precision.Average inference time is 108ms.
We used MvNC SDK 1.07.07 and their official demo script without any changes. (ncapi/py_examples/stream_infer/
It seems something is wrong with the inference results.
We recompiled graph file with -s12 option to use 12 SHAVE vector processor simultaneously.

Idein’s demo also runs GoogLeNet with 32-bit floating point precision. Average inference time is 320ms.

It’s interesting to note the GPU demo used 32-bit floating point precision, against 16-bit floating point precision on the Neural Compute Stick, although it’s unclear to me how that may affect performance of such algorithms. Intel recommends a USB 3.0 interface for MvNCS, and the Raspberry Pi 3 only comes with a USB 2.0 interface that shares the bandwidth for the USB webcam and the MvNCS, so it’s possible an ARM board with a USB 3.0 interface for the stick, and a separate USB interface for the webcam could perform better. Has anybody tested it? A USB 3.0 interface and hub would also allow to cascade several Neural Compute Sticks.

Intel’s Movidius Neural Compute Stick Brings Low Power Deep Learning & Artificial Intelligence Offline

July 21st, 2017 8 comments

Intel has released several Compute Stick over the years which can be used as tiny Windows or Linux computer connected to the HDMI port of your TV or monitor, but Movidius Neural Computer Stick is a complete different beast, as it’s a deep learning inference kit and self-contained artificial intelligence (A.I.) accelerator that connects to the USB port of computers or laptops.

Intel did not provide the full hardware specifications for the kit, but we do know the following specifications:

  • Vision Processing Unit – Intel Movidius Myriad 2 VPU with 12 VLIW 128-bit vector SHAVE processors @ 600 MHz optimized for machine vision, Configurable hardware accelerators for image and vision processing; 28nm HPC process node; up to 100 gigaflops
  • USB 3.0 type A port
  • Power Consumption – Low power, the SoC has a 1W power profile
  • Dimensions – 72.5mm x 27mm x 14mm

You can enter a trained Caffe, feed-forward Convolutional Neural Network (CNN) into the toolkit, profile it, then compile a tuned version ready for embedded deployment using Intel/Movidius Neural Compute Platform API. Inference occurs in real-time in the stick itself, and no cloud connection is needed. You can even connect multiple Movidius Compute Sticks to the same computer to scale performance.

It can help bring artificial intelligence to drones, robots, security camera, smart speakers, and anything that can leverage deep learning. The video below also shows the USB Compute Stick connected to what looks like a development board, so the target platform does not need to be powerful with most of the hard processing going inside in the stick. It currently does need to be an x86-64 computer running Ubuntu 16.04, so no ARM support.

Movidius Neural Compute Stick is sold for $79 via RS components and Mouser. You’ll find the purchase links, getting started guide and support forums on Movidius Developer site.