Posts Tagged ‘deep learning’

Rockchip RK3399Pro SoC Integrates a 2.4 TOPS Neural Network Processing Unit for Artificial Intelligence Applications

January 8th, 2018 7 comments

Rockchip RK3399 (aka OP1) SoC was launched in 2016 with an hexa core Arm Cortex A72/A53 processor, Mali-T860MP4 GPU, support for 4K video decoding, and high speed interfaces like USB 3.0 and PCIe, as well as Gigabit Ethernet. The processor is found in Chromebooks, TV boxes, development boards, and other devices.

The company has unveiled an upgraded “Pro” version of the processor at CES 2018. Rockchip RK3399Pro appears to have most of the same features as its predecessor but adds a neural network processing unit (NPU) delivering up to 2.4 TOPS for artificial intelligence and deep learning applications.

The company claims that compared to traditional solution, the computing performance of typical deep neural network Inception V3, ResNet34 and VGG16 models on RK3399Pro is improved by almost one hundred times, and power consumption is less than 1% than A.I. solutions implemented using GPU acceleration.

Based on the information provided in the chart above (source: elDEE on twitter), Rockchip RK3399Pro outperforms other high-end SoCs (for such tasks) including Apple A11, and Huawei Kirin 970, both of which also features an NPU, and even offers better performance than NVIDIA TX2.

RK3399Pro NPU supports 8-bit & 16-bit operations, is compatible with various AI software frameworks and APIs, including OpenVX and TensorFlow Lite/AndroidNN API, and comes with AI software tools capable of handling Caffe/TensorFlow models. An RK3399Pro hardware reference design can also be provided to speed up development, but no details were provided.

Click to Enlarge


Via Liliputing

Laceli AI Compute Stick is a More Powerful & Efficient Alternative to Intel/Movidius Neural Compute Stick

January 3rd, 2018 3 comments

Intel’s Movidius Neural Compute Stick is a low power deep learning inference kit and “self-contained” artificial intelligence (A.I.) accelerator that connects to the USB port of computers or development boards like Raspberry Pi 3, delivering three times more performance than a solution accelerated with VideoCore IV GPU.

So far it was the only A.I USB stick solution that I heard of, but Gyrfalcon Technology , a US startup funded at the beginning of last year, has developed its own “artificial intelligence processor” with Lightspeeur 2801S, as well as a neural USB compute stick featuring the solution: Laceli AI Compute Stick.

The company claims Laceli AI Compute Stick runs at 2.8 TOPS (Trillion operation per second) performance within 0.3 Watt of power, which is 90 times more efficient than the Movidius USB Stick that can deliver 100 GFLOPS (0.1 TOPS) within 1 Watt of power.

Information about the processor and stick is rather limited, but Gyrfalcon explains their APiM architecture (AI Processing in Memory) uses memory as the AI processing unit, which eliminates the large data movement resulting in high power consumption. The processor comes with 28,000 parallel computing cores and does not require external memory. The stick is also equipped with 4GB storage, a USB 3.0 port, and works with Caffe and TensorFlow.

Movidius Neural Compute Stick vs Laceli AI Compute Stick

Going to the company website, we’ll also find a complete development kit with USB Interface, eMMC flash, and special access port, as well as a multi-chip board with PCIe and M.2 Interfaces that appears to combine eight Lightspeeur 2801S AI processors.

The processor is already in production, and available to “qualified customers”, while the Laceli AI Compute Stick will first be showcased at CES 2018 in Las Vegas in a few days.

Thanks to TLS for the tip.

Khronos Group Releases Neural Network Exchange Format (NNEF) 1.0 Specification

December 24th, 2017 3 comments

The Khronos Group, the organization behind widely used standards for graphics, parallel computing, or vision processing such as OpenGL, Vulkan, or OpenCL, has recently published NNEF 1.0 (Neural Network Exchange Format) provisional specification for universal exchange of trained neural networks between training frameworks and inference engines.

NNEF aims to reduce machine learning deployment fragmentation by enabling data scientists and engineers to easily transfer trained networks from their chosen training framework into various inference engines via a single standardized exchange format. NNEF encapsulates a complete description of the structure, operations and parameters of a trained neural network, independent of the training tools used to produce it and the inference engine used to execute it. The new format has already been tested with tools such as TensorFlow, Caffe2, Theano, Chainer, MXNet, and PyTorch.

Khronos has also released open source tools to manipulate NNEF files, including a NNEF syntax parser/validator, and example exporters, which can be found on NNEF Tools github repository. The provisional NNEF 1.0 specification can be found here, and you can use the issue tracker on NNEF-Docs github repository to provide feedback, or leave comments.

Merry Xmas to all!

Qualcomm Developer’s Guide to Artificial Intelligence (AI)

December 21st, 2017 3 comments

Qualcomm has many terms like ML (Machine Learning), DL (Deep Learning), CNN (Convolutional Neural Network),  ANN (Artificial Neural Networks), etc.. and is currently made possible via frameworks such as TensorFlow, Caffe2 or ONNX (Open Neural Network Exchange).

If you have not looked into details, all those terms may be confusions, so Qualcomm Developer Network has released a 9-page e-Book entitled “A Developer’s Guide to
Artificial Intelligence (AI)” that gives an overview of all the terms, what they mean, and how they differ.

For example, they explain that a key difference between Machine Learning and Deep Learning is that with ML, the input features of the CNN are determined by humans, while DL requires less human intervention. The book also covers that AI is moving to the edge / on-device for low latency, and better reliability, instead of relying on the cloud.

Click to Enlarge

It also quickly go through the workflow using Snapdragon NPE SDK with a total of 4 steps including 3 done on your build machine, in cluding training, conversion to DLC (Deep Leaning Container) format, and addition of the NPE runtime to  the app, before the final step, loading and running the model on the target device.

$45 AIY Vision Kit Adds Accelerated Computer Vision to Raspberry Pi Zero W Board

December 1st, 2017 2 comments

AIY Projects is an initiative launched by Google that aims to bring do-it yourself artificial intelligence to the maker community by providing affordable development kits to get started with the technology. The first project was AIY Projects Voice Kit, that basically transformed Raspberry Pi 3 board into a Google Home device by adding the necessary hardware to support Google Assistant SDK, and an enclosure.

The company has now launched another maker kit with AIY Project Vision Kit that adds a HAT board powered by Intel/Movidius Myriad 2 VPU to Raspberry Pi Zero W, in order to accelerate image & objects recognition using TensorFlow’s machine learning models.

Click to Enlarge

The kit includes the following items:

  • Vision Bonnet accessory board powered by Myriad 2 VPU (MA2450)
  • 2x 11mm plastic standoffs
  • 24mm RGB arcade button and nut
  • 1x Privacy LED
  • 1x LED bezel
  • 1x 1/4/20 flanged nut
  • Lens, lens washer, and lens magnet
  • 50 mil ribbon cable
  • Pi0 camera flat flex cable
  • MIPI flat flex cable
  • Piezo buzzer
  • External cardboard box and internal cardboard frame

Vision Bonnet Board – Click to Enlarge

Not that the accessory board features the same Movidius VPU as Intel Neural Compute Stick, which has been used with Raspberry Pi 3, and shown to deliver about 3 times the performance compared to a GPGPU implementation leveraging VideoCore IV GPU.

Back to the kit. You’ll need to add your own Raspberry Pi Zero W, Raspberry Pi camera 2, and blank SD card (at least 4 GB) to complete the kit. Follow the assembly guide, and the final results should look like this:


Once this is done flash the Vision Kit SD image (available soon) to your micro SD card, insert it into your Raspberry Pi Zero W, and connect the power. The software image will include three neural network models:

  • A model based on MobileNets that can recognize a thousand common objects.
  • A model for face detection capable of detecting faces and facial expressions (sadness, joy, etc…)
  • A model for discerning between cats, dogs and people.

The system will be able to run at speeds of up to 30 fps, providing near real-time performance. TensorFlow code and a compiler will also be included for people wanting to have their own models. A Python API will be provided to customize the RGB button colors, piezo element sounds, and (4x) GPIO pins.

AIY Vision Kit are up for pre-order for $44.99 at Micro Center with shipping planned for earlier December. Just like AIY Voice Kit, we should eventually expect international availability via other websites such as Piromini or Seeed Studio. The complete kit with RPi board and camera, and accessories should cost around $90.

AWS DeepLens is a $249 Deep Learning Video Camera for Developers

November 30th, 2017 4 comments

Amazon Web Services (AWS) has launched Deeplens, the “world’s first deep learning enabled video camera for developers”. Powered by an Intel Atom X5 processor with 8GB, and featuring a 4MP (1080p) camera, the fully programmable system runs Ubuntu 16.04, and is designed expand deep learning skills of developers, with Amazon providing tutorials, code, and pre-trained models.

Click to Enlarge

AWS Deeplens specifications:

  • Camera – 4MP (1080p) camera using MJPEG, H.264 encoding
  • Video Output – micro HDMI port
  • Audio – 3.5mm audio jack, and HDMI audio
  • Connectivity – Dual band WiFi
  • USB – 2x USB 2.0 ports
  • Misc – Power button; camera, WiFi and power status LEDs; reset pinhole
  • Power Supply – TBD
  • Dimensions – 168 x 94 x 47 mm
  • Weight – 296.5 grams

The camera can not only do inference, but also train deep learning models using Amazon infrastructure. Performance wise, the camera can infer 14 images/second on AlexNet, and 5 images/second on ResNet 50 for batch size of 1.

Six projects samples are currently available: object detection, hot dog not hot dog, cat and dog,  activity detection, and face detection. Read that blog post to see how to get started.

But if you want to make your own project, a typical workflow would be as follows:

  • Train a deep learning model using Amazon SageMaker
  • Optimize the trained model to run on the AWS DeepLens edge device
  • Develop an AWS Lambda function to load the model and use to run inference on the video stream
  • Deploy the AWS Lambda function to the AWS DeepLens device using AWS Greengrass
  • Wire the edge AWS Lambda function to the cloud to send commands and receive inference output

This steps are explained in details on Amazon blog.

Click to Enlarge

Intel also published a press release explaining how they are involved in the project:

DeepLens uses Intel-optimized deep learning software tools and libraries (including the Intel Compute Library for Deep Neural Networks, Intel clDNN) to run real-time computer vision models directly on the device for reduced cost and real-time responsiveness.

Developers can start designing and creating AI and machine learning products in a matter of minutes using the preconfigured frameworks already on the device. Apache MXNet is supported today, and Tensorflow and Caffe2 will be supported in 2018’s first quarter.

AWS DeepLens can be pre-ordered today for $249 by US customers only (or those using a forwarding service) with shipping expected on April 14, 2018. Visit the product page on AWS for more details.

Hisilicon Hi3559A V100ES is an 8K Camera SoC with a Neural Network Accelerator

November 22nd, 2017 3 comments

Earlier today, I published a review of JeVois-A33 machine vision camera, noting that processing is handled by the four Cortex A7 cores of Allwinner A33 processor, but in the future we can expect such type of camera to support acceleration with OpenCL/Vulkan capable GPUs, or better, Neural network accelerators (NNA) such Imagination Tech PowerVR Series 2NX.

HiSilicon already launched Kirin 970 SoC with such similarIP, except they call it an NPU (Neural-network Processing Unit). However, while looking for camera SoC with NNA, I found a list of deep learning processors, including the ones that go into powerful servers and autonomous vehicles, that also included a 8K Camera SoC with a dual core CNN (Convolutional Neural Network) acceleration engine made by Hisilicon: Hi3559A V100ES.

Click to Enlarge

Hisilicon Hi3559A V100ES specifications:

  • Processor Cores
    • 2x ARM Cortex A73 @ 2 GHz, 32 KB I cache, 64KB D cache or 512 KB L2 cache
    • 2x ARM Cortex A53 @ 1 GHz, 32 KB I cache, 32KB D cache or 256 KB L2 cache
    • 1x single core ARM Cortex A53 @ 1 GHz, 32 KB I cache, 32KB D cache /128 KB L2 cache
    • Neon acceleration and integrated FPU
  • GPU – Triple core ARM Mali-G71 [email protected] 900 MHz with 256KB cache, support for OpenCL 1.1/1.2/2.0, and OpenGL ES 3.0/3.1/3.2
  • Sensor Hub
    • ARM Cortex M7 @200 MHz
    • PMC, which supports only external reset, internal POR
    • General peripheral IPs (UART, SPI, I2C, PWM, GPIO,and LSADC)
    • 3-channel LSADC, 5x UART interfaces, and 8x PWM interfaces
  • Memory Interface – 32-/64-bit DDR4 up to 8GB
  • Storage Interfaces – SPI NOR flash up to 512MB, NAND flash, eMMC 5.1 up to 2TB, UFS 2.1 up to 512GB
  • Video Encoding – H.264 BP/MP/HP, and H.265 Main Profile/Main 10 Profile up to 7680 x [email protected] [email protected] fps+7680 x [email protected] fps snapshot
  • Video Decoding – H.264 BP/MP/HP, H.265 MP/Main 10/High Tier up to [email protected] fps or H.264/H.265 [email protected] fps
  • Intelligent Video Analysis
    • Integrated intelligent analysis and acceleration engine, allowing customers to develop intelligent applications targeted for mobile camera products
    • Dual-core DSP @ 700 MHz, 32 KB I cache, 32 KB IRAM, or 512 KB DRAM
    • Dual-core CNN @ 700 MHz neural network acceleration engine
  • Video and Graphics Processing
    • 3DNR, image enhancement, and DCI
    • Anti-flicker for output videos and graphics
    • 1/15.5x to 16x video & graphics scaling
    • Horizontal seamless stitching of 2-channel videos, and 360° or 720° panoramic stitching of up to 6-channel videos
    • OSD overlaying of eight regions before encoding
    • Video graphics overlaying of two layers (video layer and graphics layer)
  • 2- channel ISP
    • Adjustable 3A functions (AE, AWB, and AF)
    • FPN removal
    • Highlight suppression, backlight compensation, gamma correction, and color enhancement
    • DPC, NR, and 6-DOF DIS
    • Anti-fog
    • LDC and fisheye correction
    • Picture rotation by 90° or 270°;  Picture mirror and flip
    • HDR10, BT.2020 WCG
    • Sensor built-in WDR, 4F/3F/2F frame-based/line-based
    • WDR and local tone mapping
    • ISP tuning tools for the PC
  • Audio Encoding and Decoding
    • Voice encoding/decoding complying with multiple protocols by using software
    • MP3, AAC, and other audio encoding formats
    • Audio 3A functions (AEC, ANR, and ALC)
  • Security Engine
    • AES, DES, and 3DES encryption and decryption algorithms implemented by using hardware
    • RSA1024/2048/3072/4096 signature verification algorithm implemented by using hardware
    • SHA1/224/256/384/512 of the HASH and HMAC_SHA1/224/256/384/512 tamper proofing algorithms implemented by using hardware
    • Integrated 32-kbit OTP storage space and hardware random number generator
  • Video Interfaces
    • Input
      • Multiple sensor inputs. The maximum resolution is 32 megapixels (7680 x 4320).
      • 8-/10-/12-/14-bit RGB Bayer DC timing VI, up to 150 MHz clock frequency
      • BT.601, BT.656, and BT.1120 VI interfaces
      • Maximum 16-lane MIPI/LVDS/sub-LVDS/HiSPi/SLVS-EC interface for the serial sensor inputs
      • Maximum 6-channel video inputs for the serial sensor inputs, supporting various working modes such as 1×16-lane/2×8-lane/4×4-lane/2×4-lane+4×2-lane
    • Output
      • HDMI 2.0, supporting maximum [email protected] fps output
      • 8-/16-/24-bit RGB digital LCD output, supporting maximum 1920 x [email protected] fps output
      • 4-lane MIPI DSI output, supporting maximum 2.5 Gbit/s per lane frequency
  • Audio Interfaces
    • Integrated audio codec, supporting 16-bit audio inputs and outputs
    • I2S interface for connecting to the external audio codec
    • Dual-channel differential MIC inputs for reducing background noises
  • Peripherals
    • POR, external reset input,
    • Internal RTC
    • Integrated 2-channel LSADC
    • 5x UART interfaces
    • IR interface, I2C interface, SSP main interface, and GPIO interface
    • Integrated GMAC, supporting  RGMII and RMII
    • 2x PWM interfaces
    • 2x SD 3.0/SDIO 3.0 interfaces, supporting SDXC
    • 2x USB 3.0/USB 2.0 host/device ports
    • 2-lane PCIe 2.0 RC/EP mode
  • Operating Voltages – 0.8V core voltage, 1.8V I/O voltage, 1.2V DDR4 voltage
  • Power Consumption – 2.6 Watts
  • Package – 15 x 15 mm with 0.4 mm pitch

Boy, that’s a monster… They should have called it MOACSoC (Mother of All Camera SoCs) 🙂 The main ARM cores are said to run Linux+Huawei LiteOS AMP heterogeneous dual systems, and the company provide a dedicated SDK for the consumer mobile camera, cient for the iOS and Android mobile phones, and a high-
performance H.265 decoding library. The SDK might be in the wild as “Hi3559AV100ES_SDK_V2.0.2.0” but I did not find a download link. I got all information above from Hi3359A V100ES ultra-HD Mobile Camera SoC product brief.

Mobile Camera and Professional Camera Solution Block Diagram

Based on the block diagram above, some mobile camera and professional camera will start taking SSD drives beside the boring SD card and USB 2.0/3.0 storage devices.

Hi3559A V100ES will also be found in drone cameras, 3D/VR cameras, and 4K/8K network-based EDR. I have no idea what the latter stands for, but the photo in the document looks like a car dashboard camera with display. Anyway, this should allows for some interesting use cases with near real-time object recognition.

Hisilicon showcased a dynamic object categorization and identification system at CPSE2017 in Shenzhen earlier this month. The company did not mention Hi3559A V100, but made clear an 8K solution was used.

Click to Enlarge

If we are to believe one person on Tencent’s ncnn Github repo, performance is really good with a 10ms lag for GoogleNet, and 89ms for VGG-SSD. We’ll have to wait a little to get more details, and Hisilicon did not post any product info on their website about their new 8K SoC, only about the earlier Hi3559 2K/4K SoC.

JeVois Smart Machine Vision Camera Review – Part 1: Developer / Robotics Kit Unboxing

October 24th, 2017 No comments

JeVois-A33 computer vision camera was unveiled at the end of last year through a Kickstarter campaign. Powered by an Allwinner A33 quad core Cortex A7 processor, and a 1.3MP camera sensor, the system could detect motion, track faces and eyes, detect & decode ArUco makers & QR codes, follow lines for autonomous cars, etc.. thanks to JeVois framework.

Most rewards from KickStarter shipped in April of this year, so it’s quite possible some of the regular readers of this blog are already familiar the camera. But the developer (Laurent Itti) re-contacted me recently, explaining they add improves the software with Python support, and new features such as the capability of running deep neural networks directly on the processor inside the smart camera. He also wanted to send a review sample, which I received today, but I got a bit more than I expected, so I’ll start the review with an unboxing of they call the “Developer / Robotics Kit”.

I got the kit in a white package, so I’ll skip the photo, and checking out directly the content.

Click to Enlarge

I was really expecting to receive a tiny camera, and not much else. So my first reaction was: “what!?” 🙂

You’ll find 5 mini USB cables inside (from top left to bottom middle):

Power Bank Info

  • USB to micro serial adapter cable, 1m long, to access the serial console in the camera when running in debug mode, or while troubleshooting Arduino code
  • mini USB + micro USB splitter cable, 15cm long, to power both the camera and Arduino board from the power bank
  • mini USB Y cable, 80cm long, to power the board via two USB 2.0 ports or to one USB 3.0 port on your host computer
  • mini USB cable, 23cm long, to power the camera from a USB port or power bank.
  • mini USB cable, 75cm long, to connect the camera to one USB 3.0 port or power bank.

The kit also includes an 8GB micro SD card pre-loaded with JeVois software, an SD adapter, a micro SD card reader, a 5V USB tester compatible with QuickCharge 2.0 to monitor the power consumption of the camera with your chosen algorithm, a 2,600 mAh power bank (large enough to power the camera for several hours), an Arduino compatible Pro mini board based on Microchip Atmel Atmega 32U4 MCU, and a business card providing useful information such as a link to a Quick Start Guide.

Oh… I almost forgot. Can you see the “fan” in the middle of photo above? That’s the actual JeVois-A33 camera. I knew it was small, but once you put it into your hands, you realize how tiny it is. The cable on the left of the camera is a micro serial cable to connect to an MCU board.

Click to Enlarge

The back of the camera features all the ports and connectors with a micro SD slot, a mini USB port, the micro serial port connector (which looks like a battery connector), and a dual color LED on left of the micro serial connector that indicates power and camera status.

Click to Enlarge

The bottom reveals an opening to cool down AXP223 PMIC.

Click to Enlarge

If you’re interested in the exact developer/robotics kit I’ve received, you can purchase it for $99.99 on JeVois, Amazon, or RobotShop (with locations in US, Canada, Japan, and France). But if you just want the camera without all cable and accessories, $49.99 will do.