Archive

Posts Tagged ‘nvidia’

NVIDIA Xavier AI SoC Now Sampling, DRIVE IX & DRIVE AR SDKs Announced

January 8th, 2018 1 comment

Well over a year ago, NVIDIA introduced Xavier, their next generation self-driving and artificial intelligence processor, with eight custom ARM cores, a 512-core Volta GPU, and support for 8K video encoding and decode. A few months ago, the company provided some more details and unveiled NVIDIA DRIVE PX Pegasus A.I. computer for level 5 autonomous driving with two Xavier processors and two NVIDIA next-generation GPUs delivering a total 320 TOPS of computing power. For that it’s worth, 320 TOPS is about 3200 times more powerful than Intel Movidus Neural Network Compute Stick.

CES 2018 has now started, and NVIDIA made several announcement related to gaming and automotive markets, and confirmed Xavier is now sampling to select customers.

Click to Enlarge

What’s really new from the announcement is the addition of two new SDKs (software development kits) for the processor beside the original NVIDIA DRIVE AV autonomous vehicle platform:

  • DRIVE IX – Intelligent experience software development kit that will enable AI assistants for both drivers and passengers using sensors inside and outside the car.
  • DRIVE AR – Augmented Reality SDK designed for interfaces that deliver information points of interest along a drive, create alerts and navigate safely and easily.

This type of powerful hardware and software is however reserved to automotive customers, so most of us won’t be able to get hold of such platform, but we may end up being users of the technology soon enough, as NVIDIA announced partnerships with Volkswagen, Uber, ZF tier-one automotive supplier working with Baidu, and Aurora, a US startup designing and building self-driving technology.

What’s the Best Android TV Box (2017/2018 Edition)?

December 26th, 2017 17 comments

Since I was often asked which TV box to buy, I wrote a guide entitled “What’s the best Android TV box?” in April 2016. Time has passed, new products have launched, I tested more devices, and got further reader feedback, so it’s time for an update.

There’s still no device that rules them all, and since everybody has different requirements and price points, what could the best Android TV box ever to one person, maybe be a piece of junk to another. Before purchasing a TV box, you should consider what you plan to do with it, and find the device with matches your needs and budget. So first, I’ll provide a list of things to look for – beside the SoC/RAM selection – before selecting three TV boxes that stand out (in no particular order), as well as alternatives worth looking at.

Things to Look for

The list is basically the same as last year, except I added two sections for operating systems, and extra features:

  • Operating System – There was a time when “Android TV box” only meant “Android” “TV Box”, but Google’s own TV box operating system has become more popular, and some companies have also started offering dual OS version with Android/Linux running at the same time, mostly for server purpose. Here are the options you may consider:
    • Official Android TV OS – Pick such device if you want the original experience with leanback launcher, and access to streaming services like Hulu, Netflix an so on. This operating system should come with all/most the licenses needed for streaming, is specially designed for the large screen, and works well with the IR remote control. However, you’ll only be able to easily install apps specifically designed for the TV (e.g. no Chrome browser, unless you sideload it), and the system may not always work well with an air mouse or wireless keyboard/touchpad.
    • Unofficial Android TV OS – Same as above, except some licenses may be missing, so some streaming services may not work as well, or be limited standard resolution
    • Android OS – Most – not to say all – boxes you’ll find in China are running Android operating system made for smartphones with customizations for the big screen. Those devices have good flexibility, since you can install pretty any app from the Google Play store, and they come with a launcher made for the big screen. The downside is that only parts of the interface or some apps will be usable with the IR remote control, so you’ll need to use an air mouse, wireless keyboard, or smartphone app to have good control of the device. Most boxes also lack proper DRM and other licenses, which may restrict the streaming services you may access, or at least the playback resolution.
    • Android + Linux – Dual boot systems have been around for a while, and IMHO not very useful, so what I’m referring to here are systems with two operating systems running at the same time with Android for media playback, and Linux for NAS/server functions. I’ve seen devices with OpenWrt or Debian so far.
  • History of regular firmware updates – If a company provides regular OTA (over-the-air) firmware updates, your device is likely to get better and better overtime. The cheapest TV boxes normally follow the ship-and-forget model, so you can’t expect any improvements, unless some community members offer custom firmware.
  • Support forums – Most reputable companies selling to end users offer support forums. For cheaper boxes, you won’t get any support, except through communities like Freaktab.
  • 4K & HDR Support – If you want to purchase a device that will support 4K videos, and the latest HDR (High Dynamic Range features) you should look for devices with HDMI 2.0a for 3840×2160 or 4096×2160 output up to 60 Hz and HDR. Double check 4K video codecs support (10-bit HEVC/H.265, VP9, H.264), and make sure they can decode the framerate used for your videos. The latter is usually not a problem with H.265, but sometimes it could be for VP9 or H.264 since some systems can only handle 30 fps or 24 fps.
  • 5.1 or 7.1 HD audio pass-through support – In case you own an amplifier or A/V receiver capable of handling Dolby TrueHD, Dolby Atmos, DTS HD Master, DTS HD High Resolution, or DTS:X, you really need to check the reviews on this site or others, as many devices fall short despite claiming support.
  • Automatic frame rate switching – This is the ability of the device to match the monitor refresh rate to the video frame rate to avoid a phenomenon called micro stutter, which makes the videos not as smooth as it could be at regular intervals, and especially noticeable when the video is panning. if this is properly implemented, e.g. 24 fps videos played using 24 Hz on the monitor, then micro-stutter disappears.
  • DRM support for HD and UHD video streaming – If you’re paying for video streaming services like Netflix, you’ll have to make sure they are specifically supported, with Widewine Level 1 DRM necessary, but not sufficient condition for playing the videos at HD or UHD (4K) resolution. Most devices can only stream videos in SD resolution due to the lack of proper DRM and a hard-to-get “Netflix license”.
  • Thermal design and storage performance – Many Android TV boxes have similar specifications, but IMHO, two key design choices are especially impacting the performance between apparently similar devices. Some TV boxes will overheat over time, leading to poor performance after a few minutes, while others with proper cooling will perform the same over hours. Fast storage will ensure the device boots quickly, apps load fast, and the device does not get slowed down while apps are installing or updating in the background.
  • Extra Features – You’d normally not care about those, if all you want is to do streaming, but if you want more from your TV box, you could check for digital TV tuner(s) (DVB-T/T2/C, DVB-S2, ATSC..), the presence of a an internal SATA bay, HDMI input for recording or broadcasting video from another device, etc…

MINIX NEO U9-H Media Hub

Click to Enlarge

Criteria:

  • Operating System – Android 6.0.1 OS
  • History of regular firmware updates – MINIX normally updates their devices for about a year or so.
  • Support forumsMINIX forums are fairly active, so you should be able to get decent support from MINIX themselves or the community of users there.
  • 4K & HDR Support – HDMI 2.0a up to 4K @ 60 Hz is supported, with very good support for 4K 10-bit H.265, VP9 and H.264 videos.
  • 5.1 or 7.1 HD audio pass-through support – Dolby TrueHD and DTS HD audio pass-through both working.
  • Automatic frame rate switching – OK (Kodi 17.x)
  • DRM support for HD and UHD video streaming –  Widewine Level 1 & Microsoft PlayReady implemented. However, Netflix can only play up to SD resolution, or possibly up to HD (720p) with a trick, but not full HD, nor UHD since Netflix requires a separate agreement.
  • Thermal design and storage performance – Good cooling thanks to a large heatsink, and very fast internal storage.
  • Extra Features – Separate microphone jack, Kensington lock

Just like MINIX NEO U1 I recommended last year, as long as you don’t need Netflix Full HD or 4K UHD playback, and are happy using their custom launcher and an air mouse, MINIX NEO U9-H should definitely be in your list of devices to consider. Please read MINIX NEO U9-H review for details, taking into account that some bugs may have been fixed since my review in March 2017.

Price: $149.90 and up with NEO A3 Lite air mouse on Amazon US, GearBest, GeekBuying, and other sellers. You can also find the box only (without air mouse) for around $139.90.

U5PVR Deluxe Set-top Box and NAS

Click to Enlarge

U5PVR Deluxe made it to the top three list because of all the extras like tuners and a 3.5″ SATA drive, and the fact that it runs both Android TV OS (unofficial) and Debian.

Criteria:

  • Operating System – Unofficial Android TV 5.1 OS and Debian running at the same time. Android TV 7.1 is now also available, as well as a dual boot image with Enigma2.
  • History of regular firmware updates – The company has released several firmware updates since the review. Previous model was U4 Quad Hybrid – Launch: January 2016; last firmware update: November 2016. So a little under a year of firmware updates.
  • Support forums – Available on SmartSTB forums (Somewhat active), or Google+ (not so active). The device is not as popular as MINIX models, so you’ll have less users involved.
  • 4K & HDR Support – HDMI 2.0a up to 4K @ 60 Hz is supported, with very good support for 4K 10-bit H.265, VP9 and H.264 videos in Media Center app (but Kodi 17.x support needed some work)
  • 5.1 or 7.1 HD audio pass-through support – Dolby TrueHD and DTS HD audio pass-through worked in Media Center app.
  • Automatic frame rate switching – OK (Media Center app)
  • DRM support for HD and UHD video streaming –  Support for Widevine L1 DRM and Netflix HD/4K (Not in my June 2017 review, but see comments)
  • Thermal design and storage performance – Excellent internal storage performance, and no noticeable issue with cooling (See teardown for design)
  • Extra Features – SATA bay for a 2.5″ or 3.5″ drive, dual DVB-T/T2 tuner

If you live in a country where DVB-T/T2 is supported (or various combination or DVB-T/T2/C, ATSC, DVB-S2 if you purchase an additional tuner board), and plan to use the Linux NAS features, U5PVR Deluxe looks certainly like a good candidate. However, if you mainly want to watch video streams from Netflix, Hulu, and other premium services, and use Kodi, there should be other devices that better fit your needs.

Price: $229.99 including shipping on Aliexpress.

Nvidia Shield Android TV (2017 Edition)

NVIDIA has launched a smaller version of their popular Shield Android TV earlier this year, and while I have not reviewed the device myself, it’s one of the most popular Android TV box on the market.

Criteria:

  • Operating System – Official Android TV 7.0 (Upgrade to Oreo likely)
  • History of regular firmware updates – Nvidia has been providing upgrades since 2015 for the original model (around 6 times a year)
  • Support forums – Active SHIELD Android TV board on Nvidia Geforce forum.
  • 4K Support – HDMI 2.0a up to 4K @ 60 Hz is supported with support for 10-bit H.265, VP9 and H.264 video playback @ 60 fps.
  • 5.1 or 7.1 HD audio pass-through support – Dolby TrueHD and DTS HD audio pass-through supported
  • Automatic frame rate switching – OK for Kodi and Plex at least.
  • DRM support for HD and UHD video streaming – Netflix HD & 4K officially supported, as well as Amazon Video
  • Thermal design and storage performanceGood storage performance, and I only read reports of isolated issues with overheating (i.e. not a design issue).
  • Extra Features – N/A

NVIDIA TV box also features the most power GPU of any TV boxes, so it’s also an excellent 3D gaming console. Availability is still an issue, although the company has launched the model in some more countries this year. This also means the device can be pretty expensive once you factor shipping, custom duties, and other fees (e.g. forward shipping) if you purchase it from a country where the device has not officially launched. Just like other devices running Android TV OS, not all apps will be available from the Play Store.

Price: Around $200 on Amazon US.

Other Alternatives

The three devices are not the only ones to consider, and other alternatives could meet some people requirements.

  • Above $100
  • Below $100
    • Xiaomi Mi Box US – Good officially Android TV option if you want to stream video from services like Vudu+, Hulu, YouTube, Netflix… and don’t care about playing games, and very high performance for other tasks
    • Mecool M8S PRO+ – Sub $40 box based on Amlogic S905X SoC with 2GB RAM/16GB storage that supports unofficial Android TV 7.1 firmware, Netflix up to 1080p. [Please note warning about eMMC flash version in the linked post]
    • Various low cost Amlogic S905/S905X TV boxes compatible with LibreELEC (Kodi Linux distribution). Note that stock Android firmware on those boxes may not be very good, so better only consider them to run LibreELEC supported by the community

I hope this guide will help some to decide on which model to buy. Feel free to comment if you think another model should be part of the top 3, or the list of alternatives.

NVIDIA DRIVE PX Pegasus Platform is Designed for Fully Autonomous Vehicles

October 11th, 2017 1 comment

Many companies are now involved in the quest to develop self-driving cars, and getting there step by step with 6 levels of autonomous driving defined based on info from  Wikipedia:

  • Level 0 – Automated system issues warnings but has no vehicle control.
  • Level 1 (”hands on”) – Driver and automated system shares control over the vehicle. Examples include Adaptive Cruise Control (ACC), Parking Assistance, and Lane Keeping Assistance (LKA) Type II.
  • Level 2 (”hands off”) – The automated system takes full control of the vehicle (accelerating, braking, and steering), but the driver is still expected to monitor the driving, and be prepared to immediately intervene at any time. You’ll actually have your hands on the steering wheel, just in case…
  • Level 3 (”eyes off”) – The driver can safely turn their attention away from the driving tasks, e.g. the driver can text or watch a movie. The system may ask the driver to take over in some situations specified by the manufacturer such as traffic jams. So no sleeping while driving 🙂 . The Audi A8 Luxury Sedan was the first commercial car to claim to be able to do level 3 self driving.
  • Level 4 (”mind off”) – Similar to level 3, but no driver attention is ever required. You could sleep while the car is driving, or even send the car somewhere without your being in the driver seat. There’s a limitation at this level, as self-driving mode is limited to certain areas, or special circumstances. Outside of these areas or circumstances, the vehicle must be able to safely park the car, if the driver does not retake control.
  • Level 5 (”steering wheel optional”) – Fully autonomous car with no human intervention required, no other limitations

So the goal is obviously to reach level 5, which would allow robotaxis, or safely drive you home whatever your alcohol or THC blood levels. This however requires lots of redundant (for safety) computing power, and current autonomous vehicle prototypes have a trunk full of computing equipments.

NVIDIA has condensed the A.I processing power required  or level 5 autonomous driving into DRIVE PX Pegasus AI computer that’s roughly the size of a license plate, and capable of handling inputs from high-resolution 360-degree surround cameras and lidars, localizing the vehicle within centimeter accuracy, tracking vehicles and people around the car, and planning a safe and comfortable path to the destination.

The computer comes with four A.I processors said to be delivering 320 TOPS (trillion operations per second) of computing power, ten times faster than NVIDIA DRIVE PX 2, or about the performance of a 100-server data center according to Jensen Huang, NVIDIA founder and CEO. Specifically, the board combines two NVIDIA Xavier SoCs and two “next generation” GPUs with hardware accelerated deep learning and computer vision algorithms. Pegasus is designed for ASIL D certification with automotive inputs/outputs, including CAN bus, Flexray, 16 dedicated high-speed sensor inputs for camera, radar, lidar and ultrasonics, plus multiple 10Gbit Ethernet

Machine learning works in two steps with training on the most powerful hardware you can find, and inferencing done on cheaper hardware, and for autonomous driving, data scientists train their deep neural networks NVIDIA DGX-1 AI supercomputer, for example being able to simulate driving 300,000 miles in five hours by harnessing 8 NVIDIA DGX systems. Once trained is completed, the models can be updated over the air to NVIDIA DRIVE PX platforms where inferencing takes place. The process can be repeated regularly so that the system is always up to date.

NVIDIA DRIVE PX Pegasus will be available to NVIDIA automotive partners in H2 2018, together with NVIDIA DRIVE IX (intelligent experience) SDK, meaning level 5 autonomous driving cars, taxis and trucks based on the solution could become available in a few years.

NVIDIA Unveils Open Source Hardware NVDLA Deep Learning Accelerator

October 4th, 2017 2 comments

NVIDIA is not exactly known for their commitment to open source projects, but to be fair things have improved since Linus Torvalds gave them the finger a few years ago, although they don’t seem to help much with Nouveau drivers, I’ve usually read positive feedback for Linux for their Nvidia Jetson boards.

So this morning I was quite surprised to read the company had launched NVDLA (NVIDIA Deep Learning Accelerator), “free and open architecture that promotes a standard way to design deep learning inference accelerators”

Comparison of two possible NVDLA systems – Click to Enlarge

The project is based on Xavier hardware architecture designed for automotive products, is scalable from small to large systems, and is said to be a complete solution with Verilog and C-model for the chip, Linux drivers, test suites, kernel- and user-mode software, and software development tools all available on Github’s NVDLA account. The project is not released under a standard open source license like MIT, BSD or GPL, but instead NVIDIA’s own Open NVDLA license.

This an on-going project, and NVIDIA has a roadmap until H1 2018, at which point we should get FPGA support for accelerating software development, as well as support for TensorRT and other supported frameworks.

Via Phoronix

Short Demo with 96Boards SynQuacer 64-bit ARM Developer Box

September 27th, 2017 17 comments

Even if you are working on ARM platforms,  you are still likely using an Intel or AMD x86 build machine, since there’s not really a good alternative in the ARM world. Linaro talked about plans to change that at Linaro Connect Budapest 2017 in March, and a few days ago, GIGABYTE SynQuacer software development platform was unveiled with a Socionext SynQuacer SC2A11 24-core Cortex-A53 processor, and everything you’d expect from a PC tower with compartment for SATA drives, PCIe slots, memory slots, multiple USB 3.0 ports, and so on.

Click to Enlarge

The platform was just demonstrated a Linaro Connect San Francisco right after Linaro High Performance Computing keynotes by Kanta Vekaria, Technology Strategist, Linaro, and Yasuo Nishiguchi, Socionext’s Chairman & CEO.

If you have never used a system with more than 14 cores, you’d sadly learn that the tux logos at boot times will only be shown on the first line, skipping the remaining 10 cores, of the 24-core system. It was hard to stomach, but I’m recovering… 🙂

The demo showed a system with an NVIDIA graphics card connected to the PCIe x16 port and leveraging Nouveau open drivers, but it’s also possible to use it as an headless “developer box”. The demo system booted quickly into Debian + Linux 4.13. They then played a YouTube video, and ran top in the developer box showing all 24-cores and 32GB RAM. That’s it. They also took questions from the audience. We learned that the system can build the Linux kernel in less than 10 minutes, they are working on SBSA compliance, and the system will be available through 96Boards website, with a complete build with memory and storage expected to cost less than $1,000. The idea is to use any off-the-shelves peripherals typically found in x86 PC towers. We still don’t know if they take MasterCard though… The video below is the full keynote with the demo starting at the 52:30 mark.

NVIDIA Jetson TX1 Developer Kit SE Offered for $199 (Promo)

August 23rd, 2017 11 comments

Launched in 2015, NVIDIA Jetson TX1 developer kit integrates some serious processing power with a Jetson TX1 module with a 256-core Maxwell GPU, four Cortex A57 cores, 4GB RAM, 16GB eMMC, and plenty of ports and I/Os via a mini-ITX carrier board. The only problem is that it’s quite expensive, as it was launched with an official $599 price tag, and it’s still $579 on Amazon US. The good news is that NVIDIA decided to launch a promotion for Jetson TX1 Developer Kit SE, based on the same $500+ development kit minus USB cable and camera module, and offered for just $199.

Click to Enlarge

Let’s refresh our memory with the board’s specifications:

  • Jeston TX1 module
    • NVIDIA Maxwell GPU with 256 NVIDIA CUDA Cores
    • Quad-core ARM Cortex-A57 MPCore Processor
    • 4 GB LPDDR4 Memory
    • 16 GB eMMC 5.1 Flash Storage
    • Connects to 802.11ac Wi-Fi and Bluetooth enabled devices
    • 10/100/1000BASE-T Ethernet
  • NVIDIA Jetson TX1 Carrier Board
    • USB – 1x USB 3.0 Type A Safety Booklet, 1x  USB 2.0 Micro AB (supports recovery and host mode)
    • HDMI
    • M.2 Key E
    • PCIe x4
    • Gigabit Ethernet
    • Full size SD card slot
    • SATA data and Power
    • GPIOs, I2C, I2S, SPI
    • TTL UART with flow control
  • Power Supply – External 19V AC adapter AC Adaptor and power cord

The kit includes NVIDIA Jetson TX1 Carrier Board, an AC Adapter and power cord, antennas to connect to Wi-Fi enabled devices, 4x rubber feet, a Quick Start Guide, and a Safety Booklet. Various other optional accessories can also be added to your purchase such HDMI cable, USB camera, USB cable, and so on.

In order to qualify for the discount, you need to be part of NVIDIA Developer Program (free registration), and while the promotion is only available in the US and Canada, the company intends to offer the kit in “other geographies starting this September”.

Getting Started with OpenCV for Tegra on NVIDIA Tegra K1, CPU vs GPU Computer Vision Comparison

May 24th, 2017 No comments

This is a guest post by Leonardo Graboski Veiga, Field Application Engineer, Toradex Brasil

Introduction

Computer vision (CV) is everywhere – from cars to surveillance and production lines, the need for efficient, low power consumption yet powerful embedded systems is nowadays one of the bleeding edge scenarios of technology development.

Since this is a very computationally intensive task, running computer vision algorithms in an embedded system CPU might not be enough for some applications. Developers and scientists have noticed that the use of dedicated hardware, such as co-processors and GPUs – the latter traditionally employed for graphics rendering – can greatly improve CV algorithms performance.

In the embedded scenario, things usually are not as simple as they look. Embedded GPUs tend to be different from desktop GPUs, thus requiring many workarounds to get extra performance from them. A good example of a drawback from embedded GPUs is that they are hardly supported by OpenCV – the de facto standard libraries for computer vision – thus requiring a big effort from the developer to achieve some performance gains.

The silicon manufacturers are paying attention to the growing need for graphics and CV-oriented embedded systems, and powerful processors are being released. This is the case with the NVIDIA Tegra K1, which has a built-in GPU using the NVIDIA Kepler architecture, with 192 cores and a processing power of 325 GFLOPS. In addition, this is one of the very few embedded GPUs in the market that supports CUDA, a parallel computing platform from NVIDIA. The good news is that OpenCV also supports CUDA.

And this is why Toradex has decided to develop a System on Module (aka Computer on Module) – the Apalis TK1 – using this processor. In it, the K1 SoC Quad Core ARM Cortex-A15 CPU runs at up to 2.2GHz, interfaced to 2GB DDR3L RAM memory and a 16GB 8-bit eMMC. The full specification of the CoM can be found here.

The purpose of this article is to install the NVIDIA JetPack on the Apalis TK1 System on Module, thus also installing OpenCV for Tegra, and trying to assess how much effort is required to code some simple CV application accelerated by CUDA. The public OpenCV is also tested using the same examples, to determine if it is a viable alternative to the closed-source version from NVIDIA.

Hardware

The hardware employed in this article consists of the Apalis TK1 System on Module and the Apalis Evaluation Board. The main features of the Apalis TK1 have been presented in the introduction, and regarding the Apalis Evaluation Board, we will use the DVI output to connect to a display and the USB ports to interface a USB camera and a keyboard. The Apalis TK1 is presented in figure 1 and the Apalis Evaluation Board in figure 2:

Figure 1 – Apalis TK1 – Click to Enlarge

Figure 2 – Apalis Evaluation Board – Click to Enlarge

System Setup

NVIDIA already provides an SDK package – the NVIDIA JetPack – that comes with all tools that are supported for the TK1 architecture. It is an easy way to start developing applications with OpenCV for Tegra support. JetPack also provides many source code samples for CUDA, VisionWorks, and GameWorks. It also installs the NVIDIA Nsight, an IDE that is based on Eclipse and can be useful for debugging CPU and GPU applications.

OpenCV for Tegra is based on version 2.4.13 of the public OpenCV source code. It is closed-source but free to use and benefits from NEON and multicore optimizations that are not present in the open-source version; on the other hand, the non-free libraries are not included. If you want or need the open-source version, you can find more information on how to build OpenCV with CUDA support here – these instructions were followed and the public OpenCV 2.4.13 was also tested during this article’s development.

Toradex provides an article in the developer website with concise information describing how to install JetPack on the Apalis TK1.

Regarding hardware, it is recommended that you have an USB webcam connected to the Apalis Evaluation Board because samples tested in this article often need a video source as input.

OpenCV for Tegra

After you have finished installing the NVIDIA JetPack, OpenCV for Tegra will already be installed on the system, as well as the toolchain required for compilation on the target. You must have access to the serial terminal by means of an USB to RS-232 adapter or an SSH connection.

If you want to run Python code, an additional step on the target is required:

The easiest way to check that everything works as expected is to compile and run some samples from the public OpenCV repository since it already has the Cmake configuration files as well as some source code for applications that make use of CUDA:

We can begin testing a Python sample, for instance, the edge detector. The running application is displayed in figure 3.

Figure 3 – running Python edge detector sample – Click to Enlarge

After the samples are compiled, you can try some of them. A nice try is the “background/foreground segmentation” samples since they are available with and without GPU support. You can run them from the commands below, as well as see the results in figures 4 and 5.

Figure 4 – running bgfg_segm CPU sample – Click to Enlarge

Figure 5 – running bgfg_segm GPU sample – Click to Enlarge

By running both samples it is possible to subjectively notice the performance difference. The CPU version has more delay.

Playing Around

After having things setup, the question comes: how easy it is to port some application from CPU to GPU, or even start developing with GPU support? It was decided to play around a little with the Sobel application that is well described in the Sobel Derivatives tutorial.

The purpose is to check if it’s possible to benefit from CUDA out-of-the-box, therefore only the function getTickCount from OpenCV is employed to measure the execution time of the main loop of the Sobel implementations. You can use the NVIDIA Nsight for advanced remote debugging and profiling.

The Code

The first code is run completely on the CPU, while in the first attempt to port to GPU (the second code, which will be called CPU-GPU), the goal is to try to find functions analog to the CPU ones, but with GPU optimization. In the last attempt to port, some improvements are done, such as creating filter engines, which reduces buffer allocation, and finding a way to replace the CPU function convertScaleAbs into GPU accelerated functions.

A diagram describing the loop for the three examples is provided in figure 6.

Figure 6 – CPU / CPU-GPU / GPU main loop for Sobel implementations

The main loop for the three applications tested is presented below. You can find the full source code for them on Github:

  • CPU only code:
  • CPU-GPU code:
  • GPU code

The Tests

  • Each of the three examples is executed using a random picture in jpeg format as input.
  • The input pictures dimensions in pixels that were tested are: 3483×2642, 2122×1415, 845×450 and 460×290.
  • The main loop is being iterated 500 times for each run.
  • All of the steps described in figure 6 have their execution time measured. This section will present the results.
  • Therefore there are 12 runs total.
  • The numbers presented in the results are the average values of the 500 iterations for each run.

The Results

The results presented are the total time required to execute the main loop – with and without image capture and display time, available in tables 1 and 2 – and the time each task takes to be executed, which is described in figures 7, 8, 9 and 10. If you want to have a look at the raw data or reproduce the tests, everything is in the aforelinked GitHub repository.

Table 1 – Main loop execution time, in milliseconds

Table 2 – Main loop execution time, discarding read and display image times, in milliseconds

Figure 7 – execution time by task – larger image (3483×2642 pixels) – Click to Enlarge

Figure 8 – execution time by task – large image (2122×1415 pixels) – Click to Enlarge

Figure 9 – execution time by task – small image (845×450 pixels) – Click to Enlarge

Figure 10 – execution time by task – smaller image (460×290 pixels) – Click to Enlarge

The Analysis

Regarding OpenCV for Tegra in comparison to the public OpenCV, the results point out that OpenCV for Tegra has been optimized, mostly for some CPU functions. Even when discarding image read  – that takes a long time to be executed, and has approximately a 2x gain – and display frame execution times, OpenCV for Tegra still bests the open-source version.

When considering only OpenCV for Tegra, from the tables, it is possible to see that using GPU functions without care might even make the performance worse than using only the CPU. Also, it is possible to notice that, for these specific implementations, GPU is better for large images, while CPU is best for small images – when there is a tie, it would be nice to have a power consumption comparison, which hasn’t been done, or also consider the fact that this GPU code is not optimized as best as possible.

Looking at the figures 7 to 10, it can be seen that the Gaussian blur and scale conversion from 16 bits to 8 bits had a big boost when running on GPU, while conversion of the original image to grayscale and the Sobel derivatives had their performance degraded. Another point of interest is the fact that transferring data from/to the GPU has a high cost, and this is, in part, one of the reasons why the first GPU port was unsuccessful – it had more copies than needed.

Regarding image size, it can be noticed that the image read and display have an impact in overall performance that might be relevant depending on the complexity of the algorithm being implemented, or how the image capture is being done.

There are probably many ways to try and/or make this code more optimized, be it by only using OpenCV; by combining custom CUDA functions with OpenCV; by writing the application fully in CUDA or; by using another framework or tool such as VisionWorks.

Two points that might be of interest regarding optimization still in OpenCV are the use of streams – asynchronous execution of code on the CPU/GPU – and zero-copy or shared memory, since the Tegra K1 has CPU and GPU shared memory supported by CUDA (see this NVIDIA presentation from GPU Technology Conference and this NVIDIA blog post for reference).

Conclusion

In this article, the installation of the NVIDIA JetPack SDK and deployment on the Toradex Apalis TK1 have been presented. Having this tool installed, you are able to use OpenCV for Tegra, thus benefiting from all of the optimizations provided by NVIDIA. The JetPack SDK also provides many other useful contents, such as CUDA, VisionWorks and GameWorks samples, and the NVIDIA Nsight IDE.

In order to assess how easy it is for a developer freshly introduced to the CV and GPU concepts to take advantage of CUDA, purely using OpenCV optimized functions, a CPU to GPU port of a Sobel filter application was written and tested. From this experience, some interesting results were found, such as the facts that GPU indeed improves performance – and this improvement magnitude depends on a series of factors, such as size of the input image, quality of implementation – or developer experience, algorithms being used and complexity of the application.

Having a myriad of sample source code, it is easy to start developing your own applications, although care is required in order to make the Apalis TK1 System on Module yield its best performance. You can find more development information in the NVIDIA documentation, as well as the OpenCV documentation. Toradex also provides documentation about Linux usage in its developer website, and has a community forum. Hope this information was helpful, see you next time!

NVIDIA Shield Android TV Gets Unofficial USB Tuner (ATSC/DVB) Support

March 9th, 2017 3 comments

NVIDIA Shield Android TV may only be available in a limited number of countries, but if you happen to live in a country where it’s officially sold, it can be one of the best options due its hard-to-beat price to performance ratio, and official Android TV software support from Google & Nvidia. One features it does not support out of the box  is support for digital TV tuner, but linux4all has released an unofficial firmware image adding USB TV tuner support to Android TV (7.0) on Nvidia Shield Android TV 2015 and 2017 models.

You’ll first need a supported tuner either Hauppauge WinTV-dualHD (DVB-C, DVB-T and DVB-T2), Hauppauge WinTV-HVR-850 (ATSC), Hauppauge WinTV-HVR-955Q (ATSC, QAM, Analog), or Sony PlayTV dual tuner (DVB-T). More tuners may be supported in the future. One you’ve got your tuner connected to Nvidia Shield Android TV, make sure you have the latest Android TV 7.0 OTA update, unlock the bootloader, and flash the specific bootloader as explained in the aforelinked forum post. Upon reboot you should see “USB TV Tuner Setup” in the interface. Go through it and scan channels.

Finally, connected a USB 3.0 hard drive or micro SD card with at least 50GB and select format as device storage, and you should be able to watch free-to-air TV and record it as needed using Live channels.

If you are interested in adding more tuners, fix bugs, or possibly implemented this for another Android TV TV box, you’ll find the Linux source code with change history on github.

Note that it’s not the first hack to use USB tuners on Shield, as last year somebody used Kodi + TVheadend, so the real news is here probably integration into Android TV’s Live Channels.

Via AndroidTv.News, and thanks to Harley for the tip.