Archive

Posts Tagged ‘gpu’

Allwinner SoCs with Mali GPU Get Mainline Linux OpenGL ES Support

September 26th, 2017 15 comments

OpenGL ES support in Linux for ARM SoC is usually pretty hard to get because of closed source binary blobs coupled with the manufacturers focus on Android. Workarounds include open driver projects such as Freedreno for Qualcomm Adreno GPU, Nouveau for Tegra, or Etnaviv for Vivante GPUs, as well as libhybris library that converts Linux calls into Android calls in order to leverage existing Android GPU binary blobs. Allwinner processors relies on either PoverVR or ARM Mali GPU, and the former does not have any open source project, while some work is still being going for the latter with Lima project, but it’s not ready yet.

That means so far, you’re only option was to use libhybris for either GPU family. The good news is that Free Electrons engineers have been working on OpenGL ES support for ARM Mali GPU for Allwinner processor, and have been allowed to release the userspace binary blobs. Not quite as exciting as an actual open source release, but at least, we should now be able to use OpenGL ES with mainline Linux on most Allwinner SoCs (the ones not using PowerVR GPUs).

If you want to try it on your platform, you’ll first need to add ARM Mali GPU device tree definitions to your platform’s DTS file if it is not already there, before building the open source Mali kernel module for your board:

This will install mali.ko module to your rootfs. The final step is to get the userspace drivers, either fbdev or X11-dma-buf depending on your setup, for example:

That should be all for the installation, and you should be able to test OpenGL ES using es2_gears or glmark2-es2 programs. Based on the github patchsets, this should currently work for Linux 4.6 to 4.14.

Update: On a separate note, somebody has recently released ffmpeg 3.3.4 with open source Cedrus driver for Allwinner video processing unit, and tested with Allwinner R40 and A64 SoC. Code and package can be found in github.

Imagination Announces PowerVR Series2NX Neural Network Accelerator (NNA), and PowerVR Series9XE and 9XM GPUs

September 21st, 2017 3 comments

Imagination Technologies has just made two announcements: one for their PowerVR Series2NX neural network accelerator, and the other for the new high-end GPU families: PowerVR Series9XE and 9XM.

PowerVR Series2NX neural network accelerator

Click to Enlarge

The companies claims 2NX can deliver twice the performance and half the bandwidth of nearest competitor, and it’s the first dedicated hardware solution with flexible bit-depth support from 16-bit down to 4-bit.

Key benefits of their solution (based on market data available in August 2017 from a variety of sources) include:

  • Highest inference/mW IP cores to deliver the lowest power consumption
  • Highest inference/mm2 IP cores to enable the most cost-effective solutions
  • Lowest bandwidth solution with support for fully flexible bit depth for weights and data including low bandwidth modes down to 4-bit
  • 2048 MACs/cycle in a single core, with the ability to go to higher levels with multi core

The PowerVR 2NX NNA is expected to be found in smartphone and other mobile devices leveraging Tensorflow Lite and API for Android, as well as Caffe2Go framework, smart surveillance cameras, assisted and autonomous driving solutions, and home entertainment with TVs and set-top boxes using artificial intelligence to adapt preferences to certain users. NNA will find their ways in more and more SoC as shown in the diagram below by Imagination showing the evolution of SoCs over the years, and this has already started as we’ve seen with Huawei Kirin 970 mobile SoC featuring its own neural processing unit (likely not 2NX though).

Click to Enlarge

PowerVR 2NX development resources include mapping and tuning tools, sample networks, evaluation tools and documentation leveraging industry standard machine learning frameworks such as Caffe and Tensorflow. The Imagination DNN (Deep Neural Network) API, working across multiple SoC configuration, should ease transition between CPU, GPU and NNA.

PowerVR 2NX NNA is available for licensing now which should mean products with the solution possibly coming sometimes in 2018. Some more details about 2NX can be found in a blog post and the product page.

PowerVR Series9XE and 9XM GPUs

Click to Enlarge

The Series9XE GPU family is an up to the previous generation Series8XE family with the same fill-rate density, but improved application performance of up to 20%, with the GPU expected to be used in cost-sensitive products such as digital TVs, set-top boxes, streaming sticks/dongles, and entry-level to mid-range mobiles and tablets.

The Series9XM family improve performance by up to 50% over the Series8XEP family with increased compute density, and should be found in premium set-top boxes, mid-range smartphones, tablets and automotive ADAS applications.

Both families benefit from improvements in the memory subsystem, reducing bandwidth by as much as 25%, include a new MMU, standard support for 10-bit YUV, and are suitable for 4K output/displays.

Some of the key benefits of the new Series9XE/9XM family include:

  • Performance/mm2
    • 9XE GPUs’ improved gaming performance while maintaining the same fillrate density compared to the previous generation
    • 9XM GPUs’ several new and enhanced architectural elements enable up to 70% better performance density than the competition (as of August 2017), and up to 50% better than the previous 8XEP generation
  • Bandwidth savings of up to 25% over the previous generation GPUs through architectural enhancements including parameter compression and tile grouping
  • Memory system improvements: 36-bit addressing for improved system integration, improved burst sizes for efficient memory accesses, and enhanced compression capabilities
  • Low power consumption thanks to  Imagination’s Tile Based Deferred Rendering (TBDR) technology
  • Support for hardware virtualization and Imagination’s OmniShield multi-domain security, enabling customers to build systems in which applications and operating systems can run independently and reliably on the same platform
  • Support for Khronos graphics APIs: OpenGL ES 3.2, and Vulkan 1.0
  • Support for advanced compute and vision APIs such as RenderScript, OpenVX 1.1 and OpenCL 1.2 EP
  • Optional support for PVRIC3 PowerVR lossless image compression technology

The company also explains the Series9XE/9XM GPU are ideal for use with the new PowerVR 2NX Neural Network Accelerator, which means NNA’s will not only be found in premium devices, but also in entry level and mid range products.

The IP is available for licensing now with four Series9XE GPU IP cores:

  • 1 PPC with 16 F32 FLOPS/clock(GE9000)
  • 2 PPC with 16 F32 FLOPS/clock (GE9100)
  • 4 PPC with 32 F32 FLOPS/clock (GE9210)
  • 8 PPC with 64 F32 FLOPS/clock (GE9420)

and three Series9XM GPU IP cores:

  • 4 PPC with 64 FP32 FLOPS/clock (GM9220)
  • 4 PPC with 128 FP32 FLOPS/clock (GM9240)
  • 8 PPC with 128 FP32 FLOPS/clock (GM9240)

Visit the product page for more details about the new PowerVR GPU families.

Movidius Neural Compute Stick Shown to Boost Deep Learning Performance by about 3 Times on Raspberry Pi 3 Board

August 9th, 2017 14 comments

Intel recently launched Movidius Neural Compute Stick (MvNCS)for low power USB based deep learning applications such as object recognition, and after some initial confusions, we could confirm the Neural stick could also be used on ARM based platforms such as the Raspberry Pi 3. Kochi Nakamura, who wrote the code for GPU accelerated object recognition on the Raspberry Pi 3 board, got hold of one sample in order to compare the performance between GPU and MvNCS acceleration.

His first attempt was quite confusing as with GoogLeNet, Raspberry Pi 3 + MvNCS achieved an average inference time of about 560ms, against 320 ms while using VideoCore IV GPU in RPi3 board. But then it was discovered that the “stream_infer.py” demo would only use one core out of the 12 VLIW 128-bit vector SHAVE processors in Intel’s Movidius Myriad 2 VPU, and after enabling all those 12 cores instead of just one, performance increased to around 108 ms average time per inference. That’s almost 3 times faster compare to using the GPU in RPi3 for this specific demo, and it may vary for other demos / applications.

That’s the description in YouTube:

Comparison of deep learning inference acceleration by Movidius’ Neural Compute Stick (MvNCS) and by Idein’s software which uses Raspberry Pi’s GPU (VideoCore IV) without any extra computing resources.

Movidius’ demo runs GoogLeNet with 16-bit floating point precision.Average inference time is 108ms.
We used MvNC SDK 1.07.07 and their official demo script without any changes. (ncapi/py_examples/stream_infer/stream_infer.py)
It seems something is wrong with the inference results.
We recompiled graph file with -s12 option to use 12 SHAVE vector processor simultaneously.

Idein’s demo also runs GoogLeNet with 32-bit floating point precision. Average inference time is 320ms.

It’s interesting to note the GPU demo used 32-bit floating point precision, against 16-bit floating point precision on the Neural Compute Stick, although it’s unclear to me how that may affect performance of such algorithms. Intel recommends a USB 3.0 interface for MvNCS, and the Raspberry Pi 3 only comes with a USB 2.0 interface that shares the bandwidth for the USB webcam and the MvNCS, so it’s possible an ARM board with a USB 3.0 interface for the stick, and a separate USB interface for the webcam could perform better. Has anybody tested it? A USB 3.0 interface and hub would also allow to cascade several Neural Compute Sticks.

Work on VideoCore V GPU Drivers Could Pave the Way for Raspberry Pi 4 Board

August 8th, 2017 26 comments

I’ve come across an article on Phoronix this morning, about VideoCore IV GPU used in Broadcom BCM283x “Raspberry Pi” processors, but part of the post also mentioned work related to VC5 drivers for the next generation VideoCore V GPU, written by Eric Anholt, working for Broadcom, and in charge of the open source code related to VideoCore IV GPU for Raspberry Pi. This led me Eric’s blog “This Week in VC4/VC5” and articles such as “2017-07-10: vc5, raspbian performance“, where he explains he committed Mesa drivers for VC5.

I’ve just pushed a “vc5” branch to my Mesa tree (https://github.com/anholt/mesa/commits/vc5). This is the culmination of a couple of months of work on building a new driver for Broadcom’s V3D 3.3. V3D 3.3 is a GLES3.1 part, though I’m nowhere near conformance yet. This driver is for BCM7268, a set-top-box SOC that boots an upstream Linux kernel. I’m really excited to be working on a modern GLES implementation, and one that has its core platform support upstream already.

Raspberry Pi 3 is a nice little board, but competition is building with features not found in the RPi foundation board such as Gigabit Ethernet, USB 3.0, 4K video playback and so on. So even though, Eben Upton has clearly said he was in no rush in releasing Raspberry Pi 4, this will happen at some time, and in order to leverage part of existing software it would make sense to use an upgrade to VideoCore IV GPU like VideoCore V.

So I’ve tried to find more information about BCM7268 SoC, and information is quite limited. We know it’s designed for 4K Ultra HD set-top boxes, and features four cores delivering up to 14,000 DMIPS. Some speculated that the processor could be used in Raspberry Pi 4, but Jamesh – Raspberry Pi Engineer & Forum Moderator – poured cold water on the idea:

That’s a set top box chip, doesn’t cover the requirements for the next gen Pi.

So I tried to specifically find more details about the GPU, but again information is limited. The GPU supports OpenGL ES 3.1, and VideoCore V (V3D-530) is about 2.4 times faster than VideoCore IV (V3D-435) in T-Rex GFXBench benchmark on dual core development boards. We don’t know the GPU frequency however, so that’s just for reference.

Anyway, there’s still plenty of time to find out since the “official” timeline for Raspberry Pi 4 is sometimes in 2019.

Android Can Now Boot with a Full Open Source Graphics Stack on NXP i.MX6 Boards

June 13th, 2017 3 comments

While the Android operating systems is itself open source, it still relies on proprietary binary files to leverage GPU acceleration, VPU hardware decoding, wireless connectivity, and so on. It’s been possible to run Android with an open source software graphics stack, but it’s normally terribly slow and barely usable. But Collabora has announced it could now boot Android with a full-graphics stack on iMX6 platforms using no proprietary blobs at all.

To do so, they leveraged the work done on Etnaviv open source drivers for Vivante GPUs, and adding the different formats used for  graphical buffers in Android and Mesa library using modifiers representing different properties of buffers. They further explain:

Support was added in two places; Mesa and gbm_gralloc. Mesa has had support added to many of the buffer allocation functions and to GBM (which is the API provided by Mesa, that gbm_gralloc uses).

gbm_gralloc in turn had support added for using a new GBM API call, GBM_BO_IMPORT_FD_MODIFIER, which imports a buffer object as well as accompanying information like modifier used by the buffer object in question

The changes will be upstreamed, but in the meantime you could try it yourself by following the instructions on memcpy.io for iMX6 Sabrelite, Sabre Lite, or RDU2 boards. A boot with Android 7.1 using the full open source graphics stack is demonstrated in the video below on RDU2 board.

Via Phoronix

ARM Cortex-A75 & Cortex-A55 Cores, and Mali-G72 GPU Details Revealed

May 27th, 2017 23 comments

We’ve already seen ARM Cortex A75 cores were coming thanks to leak showing Snapdragon 845 SoC will feature custom Cortex A75 cores, but we did not have many details. But since we live in a world where “to leak is glorious”, we already have some slides originally leaked through VideoCardz with the post now deleted, but Liliputing & TheAndroidSoul got some of the slides before deletion, so let’s see what we’ve got here.

ARM Cortex A75

So ARM Cortex-A75 will be  about 20% faster than Cortex A73 for single thread operation, itself already 30% faster than Cortex A72. It will also be the first DynamIQ capable processor together with Cortex A55 with both cores potentially used in big.LITTLE configuration.

Cortex A75 performance is only better for peak performance, and remain the same as Cortex-A73 for sustained performance.

The chart above does not start at zero, so it appear as though there are massive performance increases, but looks at the number and we can see 1.34x higher score with GeekBench, and 1.48x with Octane 2.0. Other benchmarks also have higher scores but between 1.16 and 1.33 times higher.

Click to Enlarge

Cortex A75 cores will be manufactured using 10nm process technology, and clocked at up to 3.0 GHz. While (peak) performance will be higher than Cortex A73, efficiency will remain the same.

ARM Cortex A55

Click to Enlarge

ARM Cortex A55 is the successor if Cortex-A53 with about twice the performance, and support for up to eight cores in a single cluster. There are octa-core (and even 24-core) ARM Cortex A53 processor but they also use multiple 4-core clusters.

Click to Enlarge

Power efficiency is 15% better too, and ARM claims it is 10x more configurable probably because of DynamIQ & 8-core cluster support.

Click to Enlarge

If we have a closer look at the benchmarks released by the company, we can see the 2x performance increase is only valid with LMBench memcpy memory benchmark, with other benchmarks from GeekBench v4 to SPECINT2006 showing 1.14x to 1.38x better performance. So Integer performance appears to be only slightly better, floating point gets close to 40%, and the most noticeable improvement is with memory bandwidth.

ARM Mali-G72 GPU

Click to Enlarge

Mali-G72 will offer 1.4x performance improvement over 2017 devices, which must be Mali-G71…, and will allow for machine learning directly on the device instead of having to rely on the cloud, better games, and an improved mobile VR experience.

Click to Enlarge

The new GPU is also 25& more efficient, and supports up to 32 shader cores. GEMM (general matrix multiplication) – used for example in machine learning algorithms – is improved by 17% over Cortex A73.

Click to Enlarge

Based on the information we’ve got from Qualcomm Snapdragon 845 leak, devices based on ARM Cortex A75/A55 processor and Mali-G72 GPU should start selling in Q1 2018. We may learn a few more details on Monday, once the embargo is lifted.

Getting Started with OpenCV for Tegra on NVIDIA Tegra K1, CPU vs GPU Computer Vision Comparison

May 24th, 2017 No comments

This is a guest post by Leonardo Graboski Veiga, Field Application Engineer, Toradex Brasil

Introduction

Computer vision (CV) is everywhere – from cars to surveillance and production lines, the need for efficient, low power consumption yet powerful embedded systems is nowadays one of the bleeding edge scenarios of technology development.

Since this is a very computationally intensive task, running computer vision algorithms in an embedded system CPU might not be enough for some applications. Developers and scientists have noticed that the use of dedicated hardware, such as co-processors and GPUs – the latter traditionally employed for graphics rendering – can greatly improve CV algorithms performance.

In the embedded scenario, things usually are not as simple as they look. Embedded GPUs tend to be different from desktop GPUs, thus requiring many workarounds to get extra performance from them. A good example of a drawback from embedded GPUs is that they are hardly supported by OpenCV – the de facto standard libraries for computer vision – thus requiring a big effort from the developer to achieve some performance gains.

The silicon manufacturers are paying attention to the growing need for graphics and CV-oriented embedded systems, and powerful processors are being released. This is the case with the NVIDIA Tegra K1, which has a built-in GPU using the NVIDIA Kepler architecture, with 192 cores and a processing power of 325 GFLOPS. In addition, this is one of the very few embedded GPUs in the market that supports CUDA, a parallel computing platform from NVIDIA. The good news is that OpenCV also supports CUDA.

And this is why Toradex has decided to develop a System on Module (aka Computer on Module) – the Apalis TK1 – using this processor. In it, the K1 SoC Quad Core ARM Cortex-A15 CPU runs at up to 2.2GHz, interfaced to 2GB DDR3L RAM memory and a 16GB 8-bit eMMC. The full specification of the CoM can be found here.

The purpose of this article is to install the NVIDIA JetPack on the Apalis TK1 System on Module, thus also installing OpenCV for Tegra, and trying to assess how much effort is required to code some simple CV application accelerated by CUDA. The public OpenCV is also tested using the same examples, to determine if it is a viable alternative to the closed-source version from NVIDIA.

Hardware

The hardware employed in this article consists of the Apalis TK1 System on Module and the Apalis Evaluation Board. The main features of the Apalis TK1 have been presented in the introduction, and regarding the Apalis Evaluation Board, we will use the DVI output to connect to a display and the USB ports to interface a USB camera and a keyboard. The Apalis TK1 is presented in figure 1 and the Apalis Evaluation Board in figure 2:

Figure 1 – Apalis TK1 – Click to Enlarge

Figure 2 – Apalis Evaluation Board – Click to Enlarge

System Setup

NVIDIA already provides an SDK package – the NVIDIA JetPack – that comes with all tools that are supported for the TK1 architecture. It is an easy way to start developing applications with OpenCV for Tegra support. JetPack also provides many source code samples for CUDA, VisionWorks, and GameWorks. It also installs the NVIDIA Nsight, an IDE that is based on Eclipse and can be useful for debugging CPU and GPU applications.

OpenCV for Tegra is based on version 2.4.13 of the public OpenCV source code. It is closed-source but free to use and benefits from NEON and multicore optimizations that are not present in the open-source version; on the other hand, the non-free libraries are not included. If you want or need the open-source version, you can find more information on how to build OpenCV with CUDA support here – these instructions were followed and the public OpenCV 2.4.13 was also tested during this article’s development.

Toradex provides an article in the developer website with concise information describing how to install JetPack on the Apalis TK1.

Regarding hardware, it is recommended that you have an USB webcam connected to the Apalis Evaluation Board because samples tested in this article often need a video source as input.

OpenCV for Tegra

After you have finished installing the NVIDIA JetPack, OpenCV for Tegra will already be installed on the system, as well as the toolchain required for compilation on the target. You must have access to the serial terminal by means of an USB to RS-232 adapter or an SSH connection.

If you want to run Python code, an additional step on the target is required:

The easiest way to check that everything works as expected is to compile and run some samples from the public OpenCV repository since it already has the Cmake configuration files as well as some source code for applications that make use of CUDA:

We can begin testing a Python sample, for instance, the edge detector. The running application is displayed in figure 3.

Figure 3 – running Python edge detector sample – Click to Enlarge

After the samples are compiled, you can try some of them. A nice try is the “background/foreground segmentation” samples since they are available with and without GPU support. You can run them from the commands below, as well as see the results in figures 4 and 5.

Figure 4 – running bgfg_segm CPU sample – Click to Enlarge

Figure 5 – running bgfg_segm GPU sample – Click to Enlarge

By running both samples it is possible to subjectively notice the performance difference. The CPU version has more delay.

Playing Around

After having things setup, the question comes: how easy it is to port some application from CPU to GPU, or even start developing with GPU support? It was decided to play around a little with the Sobel application that is well described in the Sobel Derivatives tutorial.

The purpose is to check if it’s possible to benefit from CUDA out-of-the-box, therefore only the function getTickCount from OpenCV is employed to measure the execution time of the main loop of the Sobel implementations. You can use the NVIDIA Nsight for advanced remote debugging and profiling.

The Code

The first code is run completely on the CPU, while in the first attempt to port to GPU (the second code, which will be called CPU-GPU), the goal is to try to find functions analog to the CPU ones, but with GPU optimization. In the last attempt to port, some improvements are done, such as creating filter engines, which reduces buffer allocation, and finding a way to replace the CPU function convertScaleAbs into GPU accelerated functions.

A diagram describing the loop for the three examples is provided in figure 6.

Figure 6 – CPU / CPU-GPU / GPU main loop for Sobel implementations

The main loop for the three applications tested is presented below. You can find the full source code for them on Github:

  • CPU only code:
  • CPU-GPU code:
  • GPU code

The Tests

  • Each of the three examples is executed using a random picture in jpeg format as input.
  • The input pictures dimensions in pixels that were tested are: 3483×2642, 2122×1415, 845×450 and 460×290.
  • The main loop is being iterated 500 times for each run.
  • All of the steps described in figure 6 have their execution time measured. This section will present the results.
  • Therefore there are 12 runs total.
  • The numbers presented in the results are the average values of the 500 iterations for each run.

The Results

The results presented are the total time required to execute the main loop – with and without image capture and display time, available in tables 1 and 2 – and the time each task takes to be executed, which is described in figures 7, 8, 9 and 10. If you want to have a look at the raw data or reproduce the tests, everything is in the aforelinked GitHub repository.

Table 1 – Main loop execution time, in milliseconds

Table 2 – Main loop execution time, discarding read and display image times, in milliseconds

Figure 7 – execution time by task – larger image (3483×2642 pixels) – Click to Enlarge

Figure 8 – execution time by task – large image (2122×1415 pixels) – Click to Enlarge

Figure 9 – execution time by task – small image (845×450 pixels) – Click to Enlarge

Figure 10 – execution time by task – smaller image (460×290 pixels) – Click to Enlarge

The Analysis

Regarding OpenCV for Tegra in comparison to the public OpenCV, the results point out that OpenCV for Tegra has been optimized, mostly for some CPU functions. Even when discarding image read  – that takes a long time to be executed, and has approximately a 2x gain – and display frame execution times, OpenCV for Tegra still bests the open-source version.

When considering only OpenCV for Tegra, from the tables, it is possible to see that using GPU functions without care might even make the performance worse than using only the CPU. Also, it is possible to notice that, for these specific implementations, GPU is better for large images, while CPU is best for small images – when there is a tie, it would be nice to have a power consumption comparison, which hasn’t been done, or also consider the fact that this GPU code is not optimized as best as possible.

Looking at the figures 7 to 10, it can be seen that the Gaussian blur and scale conversion from 16 bits to 8 bits had a big boost when running on GPU, while conversion of the original image to grayscale and the Sobel derivatives had their performance degraded. Another point of interest is the fact that transferring data from/to the GPU has a high cost, and this is, in part, one of the reasons why the first GPU port was unsuccessful – it had more copies than needed.

Regarding image size, it can be noticed that the image read and display have an impact in overall performance that might be relevant depending on the complexity of the algorithm being implemented, or how the image capture is being done.

There are probably many ways to try and/or make this code more optimized, be it by only using OpenCV; by combining custom CUDA functions with OpenCV; by writing the application fully in CUDA or; by using another framework or tool such as VisionWorks.

Two points that might be of interest regarding optimization still in OpenCV are the use of streams – asynchronous execution of code on the CPU/GPU – and zero-copy or shared memory, since the Tegra K1 has CPU and GPU shared memory supported by CUDA (see this NVIDIA presentation from GPU Technology Conference and this NVIDIA blog post for reference).

Conclusion

In this article, the installation of the NVIDIA JetPack SDK and deployment on the Toradex Apalis TK1 have been presented. Having this tool installed, you are able to use OpenCV for Tegra, thus benefiting from all of the optimizations provided by NVIDIA. The JetPack SDK also provides many other useful contents, such as CUDA, VisionWorks and GameWorks samples, and the NVIDIA Nsight IDE.

In order to assess how easy it is for a developer freshly introduced to the CV and GPU concepts to take advantage of CUDA, purely using OpenCV optimized functions, a CPU to GPU port of a Sobel filter application was written and tested. From this experience, some interesting results were found, such as the facts that GPU indeed improves performance – and this improvement magnitude depends on a series of factors, such as size of the input image, quality of implementation – or developer experience, algorithms being used and complexity of the application.

Having a myriad of sample source code, it is easy to start developing your own applications, although care is required in order to make the Apalis TK1 System on Module yield its best performance. You can find more development information in the NVIDIA documentation, as well as the OpenCV documentation. Toradex also provides documentation about Linux usage in its developer website, and has a community forum. Hope this information was helpful, see you next time!

Android Play Store Tidbits – Blocking Unlocked/Uncertified/Rooted Devices, Graphics Drivers as an App

May 20th, 2017 10 comments

There’s been at least two or three notable stories about the Play Store this week. It started with Netflix not installing from the Google Play Store anymore on rooted device, with unclocked bootloader, or uncertified devices, and showing as “incompatible”. AndroidPolice contacted Netflix which answered:

With our latest 5.0 release, we now fully rely on the Widevine DRM provided by Google; therefore, many devices that are not Google-certified or have been altered will no longer work with our latest app and those users will no longer see the Netflix app in the Play Store.

So that means you need to  Google Widevine DRM in your device, which mean many Android TV boxes may stop to work with Netflix. You can check whether you device is certified by opening Google Play and click on settings, Scroll to the bottom and check Device Certification to see if it is Certified or Uncertified (H/T jon for the tip).

I tried this in my Chinese phone, and unsurprisingly it is “Uncertified”. AndroidPolice however successfully tested both Netflix 4.16 and Netflix 5.0.4 on an unlocked Galaxy S tab with Level 3 DRM and both worked. So the only drawback right now is that you can’t install Netflix from the Play Store, but it still works normally. Some boxes do not come with any DRM at all, which you can check with DRM info, and they may not work at all (TBC).

We’ve know learned this will not only affect Netflix, as developers will now be able to block installation of apps that fail “SafetyNet” as explained at Google I/O 2017:

Developers will be able to choose from 3 states shown in the top image:

  • not excluding devices based on SafetyNet
  • excluding those that don’t pass integrity
  • excluding the latter plus those that aren’t certified by Google.

That means any dev could potentially block their apps from showing and being directly installable in the Play Store on devices that are rooted and/or running a custom ROM, as well as on emulators and uncertified devices ….. This is exactly what many of you were afraid would happen after the Play Store app started surfacing a Device certification status.

This would mean it might become more complicated to install apps from the Google Play store on some devices, and we may have to start to side-load apps again, or use other app store. That’s provided they don’t start to stop apps running all together. The latter has been possible for year, as for example many mobile banking apps refuse to run on rooted phones.

I’ll end up with a better news, as starting with Android O it will be possible to update Graphics Drivers from the Play Store, just like you would update an app. Usually, a graphics driver update would require an OTA firmware update, or flash a new firmware image manually, and it’s quite possible this new feature has been made possible thanks to Project Treble.

Categories: Android Tags: Android, app, driver, drm, google, gpu, netflix, oreo