Posts Tagged ‘machine learning’

Google Pixel Visual Core is a Custom Designed Co-Processor for Smartphone Cameras

October 18th, 2017 1 comment

Google unveiled their latest Pixel 2 & Pixel 2 XL premium smartphones powered by Snapdragon 835 SoC earlier this month, and while they are expected to go on sale tomorrow, reviewers have got their hands on samples, and one of the key feature is the camera that takes really good photos and videos as reported here and there.

You’d think the ISP and DSP inside Snapdragon 835 SoC would handle any sort of processing required to take photos. But apparently that was not enough, as Google decided to design their own custom co-processor – called Pixel Visual Core -, and integrated it into Pixel 2 phones.

The co-processor features a Cortex A53 core, an LPDDR4 memory interface, PCIe interface and MIPI CSI interface, as well as an image processing unit (IPU) IO block with 8 IPU cores. Google explains the IPU block will allow 3rd party applications to leverage features like low latency HDR+ photography, where the camera takes photos with different exposure very quickly, and “juxtapose” them to provide the best possible photo.

Each IPU core includes 512 arithmetic logic units (ALUs), and the IPU delivers more than 3 TOPS (trillion operations per second) on a mobile power budget. Pixel Visual Core allows HDR+ to run 5x faster using a tenth of energy required by running the algorithm on the application processor (AP). Programming is done using domain-specific languages: Halide for image processing and TensorFlow for machine learning, and a Google-made compiler optimizes the code for the hardware.

Pixel Visual Core will be accessible as a developer option in the developer preview of Android Oreo 8.1 (MR1), before being enabled for any apps using the Android Camera API.

NVIDIA DRIVE PX Pegasus Platform is Designed for Fully Autonomous Vehicles

October 11th, 2017 1 comment

Many companies are now involved in the quest to develop self-driving cars, and getting there step by step with 6 levels of autonomous driving defined based on info from  Wikipedia:

  • Level 0 – Automated system issues warnings but has no vehicle control.
  • Level 1 (”hands on”) – Driver and automated system shares control over the vehicle. Examples include Adaptive Cruise Control (ACC), Parking Assistance, and Lane Keeping Assistance (LKA) Type II.
  • Level 2 (”hands off”) – The automated system takes full control of the vehicle (accelerating, braking, and steering), but the driver is still expected to monitor the driving, and be prepared to immediately intervene at any time. You’ll actually have your hands on the steering wheel, just in case…
  • Level 3 (”eyes off”) – The driver can safely turn their attention away from the driving tasks, e.g. the driver can text or watch a movie. The system may ask the driver to take over in some situations specified by the manufacturer such as traffic jams. So no sleeping while driving 🙂 . The Audi A8 Luxury Sedan was the first commercial car to claim to be able to do level 3 self driving.
  • Level 4 (”mind off”) – Similar to level 3, but no driver attention is ever required. You could sleep while the car is driving, or even send the car somewhere without your being in the driver seat. There’s a limitation at this level, as self-driving mode is limited to certain areas, or special circumstances. Outside of these areas or circumstances, the vehicle must be able to safely park the car, if the driver does not retake control.
  • Level 5 (”steering wheel optional”) – Fully autonomous car with no human intervention required, no other limitations

So the goal is obviously to reach level 5, which would allow robotaxis, or safely drive you home whatever your alcohol or THC blood levels. This however requires lots of redundant (for safety) computing power, and current autonomous vehicle prototypes have a trunk full of computing equipments.

NVIDIA has condensed the A.I processing power required  or level 5 autonomous driving into DRIVE PX Pegasus AI computer that’s roughly the size of a license plate, and capable of handling inputs from high-resolution 360-degree surround cameras and lidars, localizing the vehicle within centimeter accuracy, tracking vehicles and people around the car, and planning a safe and comfortable path to the destination.

The computer comes with four A.I processors said to be delivering 320 TOPS (trillion operations per second) of computing power, ten times faster than NVIDIA DRIVE PX 2, or about the performance of a 100-server data center according to Jensen Huang, NVIDIA founder and CEO. Specifically, the board combines two NVIDIA Xavier SoCs and two “next generation” GPUs with hardware accelerated deep learning and computer vision algorithms. Pegasus is designed for ASIL D certification with automotive inputs/outputs, including CAN bus, Flexray, 16 dedicated high-speed sensor inputs for camera, radar, lidar and ultrasonics, plus multiple 10Gbit Ethernet

Machine learning works in two steps with training on the most powerful hardware you can find, and inferencing done on cheaper hardware, and for autonomous driving, data scientists train their deep neural networks NVIDIA DGX-1 AI supercomputer, for example being able to simulate driving 300,000 miles in five hours by harnessing 8 NVIDIA DGX systems. Once trained is completed, the models can be updated over the air to NVIDIA DRIVE PX platforms where inferencing takes place. The process can be repeated regularly so that the system is always up to date.

NVIDIA DRIVE PX Pegasus will be available to NVIDIA automotive partners in H2 2018, together with NVIDIA DRIVE IX (intelligent experience) SDK, meaning level 5 autonomous driving cars, taxis and trucks based on the solution could become available in a few years.

Google’s Teachable Machine is a Simple and Fun Way to Understand How Machine Learning Works

October 9th, 2017 4 comments

Artificial intelligence, machine learning, deep learning, neural networks… are all words we hear more and more today, as machines get the ability to recognize objects, answer voice requests / commands, and so on. But many people may not know at all the basics of how machine learning works, and with that in mind, Google launched Teachable Machine website to let people experiment and understand the basics behind machine learning without having to install an SDK or even code.

So I quickly tried it with Google Chrome, as it did not seem to work with Mozilla Firefox. It’s best to have audio on, as a voice explains how to use it.

Basically you connect your webcam, authorize Chrome too use it, and you should see the image in the input section on the left. After you’re being to train the machine in the learning section in the middle with three difference classes. You’ll be asked to wave your hand and keep pressing on the “Train Green” button until you have at least 100 examples. At this stage, the machine will always detect the green class since it’s all that it knows. Then you can train the Purple class by staying still, and again make sure you have at least 100 examples before you release the button. Now the machine should be able to detect when you stay still or move with a varying percentage of confidence. The output section will just show some animated GIFs, or play sound or words depending on what it detects.  It can learn actions (still, wave hands, clap hands) and object detections. My webcam is pretty bad, but if you have a good image, you should be able to also detect feelings like happiness, sadness, anger, anxiousness, etc… Give it a try it’s fun.

The Teacheable Machine has been built with a new open source hardware-accelerated JavaScript library called deeplearn.js,and Google released the source code for the website too.

ARM Cortex-A75 & Cortex-A55 Cores, and Mali-G72 GPU Details Revealed

May 27th, 2017 23 comments

We’ve already seen ARM Cortex A75 cores were coming thanks to leak showing Snapdragon 845 SoC will feature custom Cortex A75 cores, but we did not have many details. But since we live in a world where “to leak is glorious”, we already have some slides originally leaked through VideoCardz with the post now deleted, but Liliputing & TheAndroidSoul got some of the slides before deletion, so let’s see what we’ve got here.

ARM Cortex A75

So ARM Cortex-A75 will be  about 20% faster than Cortex A73 for single thread operation, itself already 30% faster than Cortex A72. It will also be the first DynamIQ capable processor together with Cortex A55 with both cores potentially used in big.LITTLE configuration.

Cortex A75 performance is only better for peak performance, and remain the same as Cortex-A73 for sustained performance.

The chart above does not start at zero, so it appear as though there are massive performance increases, but looks at the number and we can see 1.34x higher score with GeekBench, and 1.48x with Octane 2.0. Other benchmarks also have higher scores but between 1.16 and 1.33 times higher.

Click to Enlarge

Cortex A75 cores will be manufactured using 10nm process technology, and clocked at up to 3.0 GHz. While (peak) performance will be higher than Cortex A73, efficiency will remain the same.

ARM Cortex A55

Click to Enlarge

ARM Cortex A55 is the successor if Cortex-A53 with about twice the performance, and support for up to eight cores in a single cluster. There are octa-core (and even 24-core) ARM Cortex A53 processor but they also use multiple 4-core clusters.

Click to Enlarge

Power efficiency is 15% better too, and ARM claims it is 10x more configurable probably because of DynamIQ & 8-core cluster support.

Click to Enlarge

If we have a closer look at the benchmarks released by the company, we can see the 2x performance increase is only valid with LMBench memcpy memory benchmark, with other benchmarks from GeekBench v4 to SPECINT2006 showing 1.14x to 1.38x better performance. So Integer performance appears to be only slightly better, floating point gets close to 40%, and the most noticeable improvement is with memory bandwidth.

ARM Mali-G72 GPU

Click to Enlarge

Mali-G72 will offer 1.4x performance improvement over 2017 devices, which must be Mali-G71…, and will allow for machine learning directly on the device instead of having to rely on the cloud, better games, and an improved mobile VR experience.

Click to Enlarge

The new GPU is also 25& more efficient, and supports up to 32 shader cores. GEMM (general matrix multiplication) – used for example in machine learning algorithms – is improved by 17% over Cortex A73.

Click to Enlarge

Based on the information we’ve got from Qualcomm Snapdragon 845 leak, devices based on ARM Cortex A75/A55 processor and Mali-G72 GPU should start selling in Q1 2018. We may learn a few more details on Monday, once the embargo is lifted.

Google Releases Android O Developer Preview 2, Announces Android Go for Low-End Devices, TensorFlow Lite

May 18th, 2017 2 comments

After the first Android O developer preview released in March, Google has just released the second developer preview during Google I/O 2017, which on top of features like PiP (picture-in-picture), notifications channels, autofill, and others found in the first preview, adds notifications dots, a new Android TV home screen, smart text selection, and soon TensorFlow Lite. Google also introduced Android Go project optimized for devices with 512 to 1GB RAM.

Notifications dots (aka Notification Badges) are small dots that show on the top right of app icons – in supported launchers – in case a notification is available. You can then long press the icon to check out the notifications for the app, and dismiss or act on notifications. The feature can be disabled in the settings.

Android TV “O” also gets a new launcher that allegedly “makes it easy to find, preview, and watch content provided by apps”. The launcher is customizable as users can control the channels that appear on the homescreen. Developers will be able to create channels using the new TvProvider support library APIs.

I found text selection in Android to be awkward and frustrating most of the big time, but Android O brings improvements on that front with “Smart Text Selection” leveraging on-device machine learning to copy/paste, to let Android recognize entities like addresses, URLs, telephone numbers, and email addresses.

TensorFlow is an open source machine learning library that for example allows image recognition. Android O will now support TensorFlow Lite specifically designed to be fast and lightweight for embedded use cases. The company is also working on a new Neural Network API to accelerate computation, and both plan for release in a future maintenance update of Android O later this year.

Finally, Android Go project targets devices with 1GB or less of memory, and including optimization to the operating system itself, as well as optimization to apps such as YouTube, Chrome, and Gboard to make them use less memory, storage space, and mobile data. The Play Store will also highlight apps with low resources requirements on such devices, but still provide access to the full catalog. “Android Go” will ship in 2018 for all Android devices with 1GB or less of memory.

You can test Android O developer preview 2 by joining the Android O beta program if you own a Nexus 5X, 6P, Nexus Player, Pixel, Pixel XL, or Pixel C device.

Open Source ARM Compute Library Released with NEON and OpenCL Accelerated Functions for Computer Vision, Machine Learning

April 3rd, 2017 12 comments

GPU compute promises to deliver much better performance compared to CPU compute for application such a computer vision and machine learning, but the problem is that many developers may not have the right skills or time to leverage APIs such as OpenCL. So ARM decided to write their own ARM Compute library and has now released it under an MIT license.

The functions found in the library include:

  • Basic arithmetic, mathematical, and binary operator functions
  • Color manipulation (conversion, channel extraction, and more)
  • Convolution filters (Sobel, Gaussian, and more)
  • Canny Edge, Harris corners, optical flow, and more
  • Pyramids (such as Laplacians)
  • HOG (Histogram of Oriented Gradients)
  • SVM (Support Vector Machines)
  • H/SGEMM (Half and Single precision General Matrix Multiply)
  • Convolutional Neural Networks building blocks (Activation, Convolution, Fully connected, Locally connected, Normalization, Pooling, Soft-max)

The library works on Linux, Android or bare metal on armv7a (32bit) or arm64-v8a (64bit) architecture, and makes use of  NEON, OpenCL, or  NEON + OpenCL. You’ll need an OpenCL capable GPU, so all Mali-4xx GPUs won’t be fully supported, and you need an SoC with Mali-T6xx, T-7xx, T-8xx, or G71 GPU to make use of the library, except for NEON only functions.

In order to showcase their new library, ARM compared its performance to OpenCV library on Huawei Mate 9 smartphone with HiSilicon Kirin 960 processor with an ARM Mali G71MP8  GPU.

ARM Compute Library vs OpenCV, single-threaded, CPU (NEON)

Even with some NEON acceleration in OpenCV, Convolutions and SGEMM functions are around 15 times faster with the ARM Compute library. Note that ARM selected a hardware platform with one of their best GPU, so while it should still be faster on other OpenCL capable ARM GPUs the difference will be lower, but should still be significantly, i.e. several times faster.

ARM Compute Library vs OpenCV, single-threaded, CPU (NEON)

The performance boost in other function is not quite as impressive, but the compute library is still 2x to 4x faster than OpenCV.

While the open source release was just about three weeks ago, the ARM Compute library has already been utilized by several embedded, consumer and mobile silicon vendors and OEMs better it was open sourced, for applications such as 360-degree camera panoramic stitching, computational camera, virtual and augmented reality, segmentation of images, feature detection and extraction, image processing, tracking, stereo and depth calculation, and several machine learning based algorithms.