Archive

Posts Tagged ‘tensorflow’

Arm’s Project Trillium Combines Machine Learning and Object Detection Processors with Neural Network Software

February 14th, 2018 No comments

We’ve already seen Neural Processing Units (NPU) added to Arm processors such as Huawei Kirin 970 or Rockchip RK3399Pro in order to handle the tasks required by machine learning & artificial intelligence in a faster or more power efficient way.

Arm has now announced their Project Trillium offering two A.I. processors, with one ML (Machine Learning) processor and one OD (Object Detection) processor, as well as open source Arm NN (Neural Network) software to leverage the ML processor, as well as Arm CPUs and GPUs.

Arm ML processor key features and performance:

  • Fixed function engine for the best performance & efficiency for current solutions
  • Programmable layer engine for futureproofing the design
  • Tuned for advance geometry implementations.
  • On-board memory to reduce external memory traffic.
  • Performance / Efficiency – 4.6 TOP/s with an efficiency of 3 TOPs/W for mobile devices and smart IP cameras
  • Scalable design usable for lower requirements IoT (20 GOPS) and Mobile (2 to 5 TOPS) applications up to more demanding server and networking loads (up to 150 TOPS)

Arm OD processor main features:

  • Detects objects in real time running with Full HD (1920×1080) at 60fps (no dropped frames)
  • Object sizes – 50×60 pixels to full screen
  • Virtually unlimited objects per frame
  • Image Input – Raw input from camera or from ISP
  • Latency – 4 frames.
  • Can be combined with CPUs, GPUs or the Arm ML processor for additional local processing, reducing overall compute requirement

The company provides drivers, detailed people model with rich metadata allowing the detection of direction, trajectory, pose and gesture, an object tracking library and sample applications. I could not find any of those online, and it appears you have to contact the company for details if you plan to use this into your chip.

Click to Enlarge

Arm will be more open about their Arm NN software development kit which bridges existing neural network frameworks like TensorFlow, Caffe(2), or MxNet – to the underlying processing hardware including Arm Cortex CPUs, Arm Mali GPUs, or the new ML processor. Arm NN will focus on inference, which happens on the target device (aka at the edge), rather than training which happens on powerful servers.

Arm NN utilizes the Compute Library to target  Cortex-A CPUs, Mali GPUs & Arm ML processor, and CMSIS-NN for Cortex-M CPUs. The first release will support Caffe with downloads, resources, and documentation becoming available in March 2018, with TensorFlow coming next, and subsequently other neural network frameworks will be added to the SDK. There’s also seems to be an Arm NN release for Android 8.1 (NNAPI) that currently works with CPU and GPU.

Visit Arm’s machine learning page for more details about Project Trillum, ML & OD processors, and Arm NN SDK. You may also be interested in a few related blog posts on Arm community.

Rockchip RK3399Pro SoC Integrates a 2.4 TOPS Neural Network Processing Unit for Artificial Intelligence Applications

January 8th, 2018 7 comments

Rockchip RK3399 (aka OP1) SoC was launched in 2016 with an hexa core Arm Cortex A72/A53 processor, Mali-T860MP4 GPU, support for 4K video decoding, and high speed interfaces like USB 3.0 and PCIe, as well as Gigabit Ethernet. The processor is found in Chromebooks, TV boxes, development boards, and other devices.

The company has unveiled an upgraded “Pro” version of the processor at CES 2018. Rockchip RK3399Pro appears to have most of the same features as its predecessor but adds a neural network processing unit (NPU) delivering up to 2.4 TOPS for artificial intelligence and deep learning applications.

The company claims that compared to traditional solution, the computing performance of typical deep neural network Inception V3, ResNet34 and VGG16 models on RK3399Pro is improved by almost one hundred times, and power consumption is less than 1% than A.I. solutions implemented using GPU acceleration.

Based on the information provided in the chart above (source: elDEE on twitter), Rockchip RK3399Pro outperforms other high-end SoCs (for such tasks) including Apple A11, and Huawei Kirin 970, both of which also features an NPU, and even offers better performance than NVIDIA TX2.

RK3399Pro NPU supports 8-bit & 16-bit operations, is compatible with various AI software frameworks and APIs, including OpenVX and TensorFlow Lite/AndroidNN API, and comes with AI software tools capable of handling Caffe/TensorFlow models. An RK3399Pro hardware reference design can also be provided to speed up development, but no details were provided.

Click to Enlarge

 

Via Liliputing

Laceli AI Compute Stick is a More Powerful & Efficient Alternative to Intel/Movidius Neural Compute Stick

January 3rd, 2018 3 comments

Intel’s Movidius Neural Compute Stick is a low power deep learning inference kit and “self-contained” artificial intelligence (A.I.) accelerator that connects to the USB port of computers or development boards like Raspberry Pi 3, delivering three times more performance than a solution accelerated with VideoCore IV GPU.

So far it was the only A.I USB stick solution that I heard of, but Gyrfalcon Technology , a US startup funded at the beginning of last year, has developed its own “artificial intelligence processor” with Lightspeeur 2801S, as well as a neural USB compute stick featuring the solution: Laceli AI Compute Stick.

The company claims Laceli AI Compute Stick runs at 2.8 TOPS (Trillion operation per second) performance within 0.3 Watt of power, which is 90 times more efficient than the Movidius USB Stick that can deliver 100 GFLOPS (0.1 TOPS) within 1 Watt of power.

Information about the processor and stick is rather limited, but Gyrfalcon explains their APiM architecture (AI Processing in Memory) uses memory as the AI processing unit, which eliminates the large data movement resulting in high power consumption. The processor comes with 28,000 parallel computing cores and does not require external memory. The stick is also equipped with 4GB storage, a USB 3.0 port, and works with Caffe and TensorFlow.

Movidius Neural Compute Stick vs Laceli AI Compute Stick

Going to the company website, we’ll also find a complete development kit with USB Interface, eMMC flash, and special access port, as well as a multi-chip board with PCIe and M.2 Interfaces that appears to combine eight Lightspeeur 2801S AI processors.

The processor is already in production, and available to “qualified customers”, while the Laceli AI Compute Stick will first be showcased at CES 2018 in Las Vegas in a few days.

Thanks to TLS for the tip.

Khronos Group Releases Neural Network Exchange Format (NNEF) 1.0 Specification

December 24th, 2017 3 comments

The Khronos Group, the organization behind widely used standards for graphics, parallel computing, or vision processing such as OpenGL, Vulkan, or OpenCL, has recently published NNEF 1.0 (Neural Network Exchange Format) provisional specification for universal exchange of trained neural networks between training frameworks and inference engines.

NNEF aims to reduce machine learning deployment fragmentation by enabling data scientists and engineers to easily transfer trained networks from their chosen training framework into various inference engines via a single standardized exchange format. NNEF encapsulates a complete description of the structure, operations and parameters of a trained neural network, independent of the training tools used to produce it and the inference engine used to execute it. The new format has already been tested with tools such as TensorFlow, Caffe2, Theano, Chainer, MXNet, and PyTorch.

Khronos has also released open source tools to manipulate NNEF files, including a NNEF syntax parser/validator, and example exporters, which can be found on NNEF Tools github repository. The provisional NNEF 1.0 specification can be found here, and you can use the issue tracker on NNEF-Docs github repository to provide feedback, or leave comments.

Merry Xmas to all!

Qualcomm Snapdragon 845 Octa Core Kryo 385 SoC to Power Premium Smartphones, XR Headsets, Windows Laptops

December 7th, 2017 9 comments

Qualcomm Snapdragon 845 processor was expected since May 2017 with four custom Cortex A75 cores, four Cortex A53 cores, Adreno 630 GPU, and X20 LTE modem. with the launch planned for Q1 2018. At least, that what the leaks said.

Qualcomm has now formally launched Snapdragon 845 Mobile Platform and rumors were mostly right, as the the octa-core processor comes with four Kryo 385 Gold cores (custom Cortex A75), four Kryo 385 Silver cores (custom Cortex A55) leveraging DynamIQ technology, an Adreno 630 “Visual Processing System”, and Snapdragon X20 modem supporting LTE Cat18/13.

The processor is said to use more advanced artificial intelligence (AI) allowing what the company calls “extended reality (XR)” applications, and will soon be found in flagship smartphones, XR headsets, mobile PCs, and more.

Qualcomm Snapdragon 845 (SDM845) specifications:

  • Processor
    • 4x Kryo 385 Gold performance cores @ up to 2.80 GHz (custom ARM Cortex A75 cores)
    • 4x Kryo 385 Silver efficiency cores @ up to 1.80 GHz (custom ARM Cortex A55 cores)
    • DynamIQ technology
  • GPU (Visual Processing Subsystem) – Adreno 630 supporting OpenGL ES 3.2, OpenCL 2.0,Vulkan 1.x, DxNext
  • DSP
    • Hexagon 685 with 3rd Gen Vector Extensions, Qualcomm All-Ways Aware Sensor Hub.
    • Supports Snapdragon Neural Processing Engine (NPE) SDK, Caffe, Caffe2, and Tensorflow
  • Memory I/F – LPDDR4x, 4×16 bit up to 1866MHz, 8GB RAM
  • Storage I/F – TBD (Likely UFS 2.1, but maybe UFS 3.0?)
  • Display
    • Up to 4K Ultra HD, 60 FPS, or dual 2400×2400 @ 120 FPS (VR); 10-bit color depth
    • DisplayPort and USB Type-C support
  • Audio
    • Qualcomm Aqstic audio codec and speaker amplifier
    • Qualcomm aptX audio playback with support for aptX Classic and HD
    • Native DSD support, PCM up to 384kHz/32bit
  • Camera
    • Spectra 280 ISP with dual 14-bit ISPs
    • Up to 16 MP dual camera, up to 32 MP single camera
    • Support for 16MP image sensor operating up to 60 frames per second
    • Hybrid Autofocus, Zero Shutter Lag, Multi-frame Noise Reduction (MFNR)
    • Video Capture – Up to 4K @ 60fps HDR (H.265), up to 720p @ 480fps (slow motion)
  • Connectivity
    • Cellular Modem – Snapdragon X20 with peak download speed: 1.2 Gbps (LTE Cat 18), peak upload speed: 150 Mbps (LTE Cat 13)
    • Qualcomm Wi-Fi 802.11ad Multi-gigabit, integrated 802.11ac 2×2 with MU-MIMO, 2.4 GHz, 5 GHz and 60 GHz
    • Qualcomm TrueWireless Bluetooth 5
  • Location – Support for 6 satellite systems: GPS, GLONASS, Beidou, Galileo, QZSS, SBAS; low power geofencing and tracking, sensor-assisted navigation
  • Security – Qualcomm Secure Processing Unit (SPU), Qualcomm Processor Security, Qualcomm Mobile Security, Qualcomm Content Protection
  • Charging – Qualcomm Quick Charge 4/4+ technology
  • Process – 10nm LPP

The company will provide support for Android and Windows operating systems. eXtended Reality (XR) is enabled with features such as room-scale 6DoF with simultaneous localization and mapping (SLAM), advanced visual inertial odometry (VIO), and Adreno Foveation. Maybe I don’t follow the phone market closely enough, but I can’t remember seeing odometry implemented in any other phones, and Adreon Foveation is not quite self-explaining, so the company explains it combines graphics rendering with eye tracking, and directs the highest graphics resources to where you’re physically looking, while using less resources for rendering other areas. This improves the experience, performance, and lower power consumption.

 

Click to Enlarge

Compared to Snapdragon 835, the new processor is said to be around 25 to 30% faster, the Spectra camera and Adreno graphics architectures are claimed to boost power efficiency by up to 30 percent, and the LTE modem is a bit faster (1.2 Gbps/150Mbps vs 1.0 Gbps/150Mbps). Quick Charge 4+ technology should deliver up  to 50 percent charge in 15 minutes. Earlier this year when SD835 was officially launched, there was virtually no mention of artificial intelligence support in mobile APs, but now NNA (Neural Network Accelerator) or NPE (Neural Processing Engine) are part of most high-end mobile processors, which in SD845 appears to be done though the Hexagon 685 DSP. High Dynamic Range (HDR) for video playback and capture is also a novelty in the new Snapdragon processor.

One of the first device powered by Snapdragon 845 will be Xiaomi Mi 7 smartphone, and according to leaks it will come with a 6.1″ display, up to 8GB RAM, dual camera, 3D facial recognition, and more. Further details about the phone are expected for Mobile World Congress 2018. Considering the first Windows 10 laptop based on Snapdragon 835 processor are expected in H1 2018, we may have to wait until the second part of the year for the launch of Snapdragon 845 mobile PCs.

More details may be found on Qualcomm Snapdragon 845 mobile platform product page.

AWS DeepLens is a $249 Deep Learning Video Camera for Developers

November 30th, 2017 4 comments

Amazon Web Services (AWS) has launched Deeplens, the “world’s first deep learning enabled video camera for developers”. Powered by an Intel Atom X5 processor with 8GB, and featuring a 4MP (1080p) camera, the fully programmable system runs Ubuntu 16.04, and is designed expand deep learning skills of developers, with Amazon providing tutorials, code, and pre-trained models.

Click to Enlarge

AWS Deeplens specifications:

  • Camera – 4MP (1080p) camera using MJPEG, H.264 encoding
  • Video Output – micro HDMI port
  • Audio – 3.5mm audio jack, and HDMI audio
  • Connectivity – Dual band WiFi
  • USB – 2x USB 2.0 ports
  • Misc – Power button; camera, WiFi and power status LEDs; reset pinhole
  • Power Supply – TBD
  • Dimensions – 168 x 94 x 47 mm
  • Weight – 296.5 grams

The camera can not only do inference, but also train deep learning models using Amazon infrastructure. Performance wise, the camera can infer 14 images/second on AlexNet, and 5 images/second on ResNet 50 for batch size of 1.

Six projects samples are currently available: object detection, hot dog not hot dog, cat and dog,  activity detection, and face detection. Read that blog post to see how to get started.

But if you want to make your own project, a typical workflow would be as follows:

  • Train a deep learning model using Amazon SageMaker
  • Optimize the trained model to run on the AWS DeepLens edge device
  • Develop an AWS Lambda function to load the model and use to run inference on the video stream
  • Deploy the AWS Lambda function to the AWS DeepLens device using AWS Greengrass
  • Wire the edge AWS Lambda function to the cloud to send commands and receive inference output

This steps are explained in details on Amazon blog.

Click to Enlarge

Intel also published a press release explaining how they are involved in the project:

DeepLens uses Intel-optimized deep learning software tools and libraries (including the Intel Compute Library for Deep Neural Networks, Intel clDNN) to run real-time computer vision models directly on the device for reduced cost and real-time responsiveness.

Developers can start designing and creating AI and machine learning products in a matter of minutes using the preconfigured frameworks already on the device. Apache MXNet is supported today, and Tensorflow and Caffe2 will be supported in 2018’s first quarter.

AWS DeepLens can be pre-ordered today for $249 by US customers only (or those using a forwarding service) with shipping expected on April 14, 2018. Visit the product page on AWS for more details.

Google Releases Tensorflow Lite Developer Preview for Android & iOS

November 17th, 2017 1 comment

Google mentioned TensorFlow Lite at Google I/O 2017 last may, an implementation of TensorFlow open source machine learning library specifically optimized for embedded use cases. The company said support was coming to Android Oreo, but it was not possible to evaluate the solution at the time.

The company has now released a developer preview of TensorFlow Lite for mobile and embedded devices with a lightweight cross-platform runtine that runs on Android and iOS for now.

TensorFlow Lite Architecture – Click to Enlarge

TensorFlow Lite supports the Android Neural Networks API to take advantage of Machine Learning accelerators when available, but falls back to  CPU execution otherwise.

The architecture diagram above shows three components for TensorFlow Lite:

  • TensorFlow Model – A trained TensorFlow model saved on disk.
  • TensorFlow Lite Converter – A program that converts the model to the TensorFlow Lite file format.
  • TensorFlow Lite Model File – A model file format based on FlatBuffers, that has been optimized for maximum speed and minimum size.

The model file is then within a Mobile App using a C++ or Java (Android only) API, and an interpreter optionally using the Neural Networks API.

TensorFlow Lite currently supports three models: MobileNet (A class of vision models to identify across 1000 different object classes),Inception v3 (An image recognition model with higher accuracy, larger size), and Smart Reply (An on-device conversational model for one-touch replies to chat messages).

The preview release is available on Github, where you’ll also find a demo app that can be tried with a pre-build binary, but it’s probably more fun/useful to instead build it from source in Android Studio and try to change the code to experiment and learn. You can also build the complete framework and demo app from source by cloning the repo. TensorFlow Lite may also be coming to Linux soon, as one of the comment in the announcement mentions that “it should be pretty easy to build TensorFlow Lite on Raspberry PI. We plan to make sure this path works seamlessly soon“. While most of the documentation can be found on Github, some more info may be available on TensorFlow Lite page.

Google Pixel Visual Core is a Custom Designed Co-Processor for Smartphone Cameras

October 18th, 2017 1 comment

Google unveiled their latest Pixel 2 & Pixel 2 XL premium smartphones powered by Snapdragon 835 SoC earlier this month, and while they are expected to go on sale tomorrow, reviewers have got their hands on samples, and one of the key feature is the camera that takes really good photos and videos as reported here and there.

You’d think the ISP and DSP inside Snapdragon 835 SoC would handle any sort of processing required to take photos. But apparently that was not enough, as Google decided to design their own custom co-processor – called Pixel Visual Core -, and integrated it into Pixel 2 phones.

The co-processor features a Cortex A53 core, an LPDDR4 memory interface, PCIe interface and MIPI CSI interface, as well as an image processing unit (IPU) IO block with 8 IPU cores. Google explains the IPU block will allow 3rd party applications to leverage features like low latency HDR+ photography, where the camera takes photos with different exposure very quickly, and “juxtapose” them to provide the best possible photo.

Each IPU core includes 512 arithmetic logic units (ALUs), and the IPU delivers more than 3 TOPS (trillion operations per second) on a mobile power budget. Pixel Visual Core allows HDR+ to run 5x faster using a tenth of energy required by running the algorithm on the application processor (AP). Programming is done using domain-specific languages: Halide for image processing and TensorFlow for machine learning, and a Google-made compiler optimizes the code for the hardware.

Pixel Visual Core will be accessible as a developer option in the developer preview of Android Oreo 8.1 (MR1), before being enabled for any apps using the Android Camera API.