Enabling AV1 Hardware Video Decoding in Windows 10

Windows 10 AV1 Hardware Video Decoding

AOMedia AV1 royalty-free video codec delivers up to 50% better compression than H.264 and up to 20% better than VP9 for the same video content, so streaming companies like Netflix and YouTube enabled the codec a while ago. When I play a YouTube video in Chrome in Ubuntu 20.04 and check the stats for nerds info, it will usually show the video is played with “av01.0” codec which refers to AV1, but it’s played with software video decoding using libraries like Dav1d leveraging SIMD instructions. But ideally, you’d want hardware video decoding for lower power consumption for increased battery life, and potentially smoother videos. One good news is that Microsoft has recently announced support for AV1 GPU-accelerated hardware video decoding in Windows 10. The less good news is that support will be limited to recent and fairly powerful GPUs. So for instance, if you own mini PCs with older Intel […]

Arm Announces Cortex-A78AE CPU, Mali-G78AE GPU & Mali-C71AE ISP for autonomous automotive & industrial applications

Cortex-A78AE

Arm has announced new CPU, GPU, and ISP specifically designed for autonomous automotive and industrial applications with respectively Cortex-A78AE CPU, Arm Mali-G78AE GPU, and Arm Mali-C71AE ISP. Arm Cortex-A78AE CPU Key features and specifications: Architecture – Armv8.2-A (AArch32 at ELO only) Extensions – Armv8.1, Armv8.2, and Armv8.3 extensions (LDAPR instructions only), RAS extensions, Armv8.4 Dot Product, Cryptography extensions, RAS extensions Microarchitecture Up to 4x CPU cores per cluster Out of order pipeline Neon / Floating Point Unit included with INT8 Dot Product and IEEE FP16 Optional Cryptography Unit 48-bit Physical Addressing (PA) Memory system and external interfaces 32kB to 64kB L1 I-Cache / D-Cache 256kB to 512kB L2 Cache Optional 512kB to 4MB L3 Cache ECC Support LPAE Bus interfaces – AMBA ACE or CHI Optional ACP, peripheral port Functional Safety Support – ASIL D Systematic1 and ASIL D Diagnostic2 Security – TrustZone Interrupts – External GICv4 Generic timer – […]

Arm Officially Supports Panfrost Open-Source Mali GPU Driver Development

Arm Panfrost Linux

Most GPU drivers found in Arm processors are known to be closed-source making it difficult and time-consuming to fix some of the bugs since everybody needs to rely on the silicon vendor to fix those for them, and they may even decide a particular bug is not important to them, so you’d be out of luck. So the developer community has long tried to reverse-engineer GPU drivers with projects like Freedreno (Qualcomm Adreno), Etnaviv (Vivante), as well as Lima and Panfrost for Arm Mali GPUs. Several years ago, Arm management was not interested at all collaborating with open-source GPU driver development for Mali GPUs, but as noted by Phoronix, Alyssa Rosenzweig,  a graphics software engineer employed by Collabora, explained Panfrost development was now done in partnership with Arm during a talk at the annual X.Org Developers’ Conference (XDC 2020). A recent merge commit confirms the move with Daniel Stone, Graphics […]

Intel Launches 11th Gen Intel Core “Tiger Lake” Processors with Intel Iris Xe graphics

Intel Tiger Lake

Intel has officially launched Tiger Lake processors for thin-and-light Laptops. The new 11th generation processors come with either Intel Iris Xe graphics or the older Intel UHD graphics, and nine processors are currently available divided into two families: UP3 with 12W to 28W configurable TDP, and UP4 with 7W to 15W configurable TDP. Manufactured with a 10nm process, the new processors are said to deliver up to 2.7x faster content creation (Photo Editing), more than 20% faster office productivity in Office 365, and more than 2x faster gaming plus streaming when comparing an Intel Core i7-1185G7 processor to AMD Ryzen 7 4800U. The new Intel Iris Xe graphics comes with up to 96 EUs and up to 16MB of L3 cache, and Intel claims it outperforms 90% of the discrete graphics usually paired with U-series processors. Tiger Lake processors also integrate a new DP4a instruction set for neural network inferencing […]

Perfetto Profiler Now Supports Mali GPU Hardware Counters via Panfrost

Perfetto Mali GPU Profiling

Perfetto is an open-source system profiler, app tracer, and trace analyzer for Linux, Android & Chrome platforms, and user-space apps. The program can already visualize CPU and memory usage, as well as power consumption.  GPU support is more limited with the program only capable of sampling the GPU frequency when the driver outputs that information via ftrace. When Perfetto is also extendable thanks to a Tracing C++ SDK that “allows userspace applications to emit trace events and add more app-specific context to a Perfetto trace”. Collabora made use of the tracing SDK to add support for Mali Midgard GPU performance profiling in gfx-pps project using the Mali GPU hardware counters exposed via Panfrost open-source Mali GPU driver. After following the installation instructions, you’ll be able to run the following executables for tracing and profiling: tracedtracing service. traced_probes OS probes service. perfetto command-line tool for recording traces. producer-gpuproviding the Panfrost data […]

GNOME Renders on Arm Mali-G31 Bifrost GPU with Fully Open Source Code

Panfrost ODROID Go Advance Black Edition

We first wrote about Panfrost open-source Arm Mali GPU driver getting initial support for Mali-G31 Bifrost GPU in late April, when engineers at Collabora managed to run some basic demos. Progress has been fast-paced as the company has now implemented support for all major features of OpenGL ES 2.0 and some features of OpenGL 2.1. That means hardware-based on Arm Mali-G31 GPU such as ODROID Go Advance (used for testing) can run Wayland compositors with zero-copy graphics, including GNOME 3, every scene in glmark2-es2 benchmarks, and some 3D games such as Neverball. All without any binary blobs. The company also claims to support hardware-accelerated video players mpv and Kodi. The way it should work is that while Panfrost driver renders the user interface, Amlogic open-source video decoder developed by BayLibre handles hardware video decoding. All changes are already included in upstream Mesa with no out-of-tree patches required, and Bifrost support […]

Arm Announces Cortex-A78 CPU, Mali-G78 GPU, Ethos-N78 NPU and Custom Cortex-X1 Core

Arm Cortex A78

Arm has just announced its 2020 Arm Mobile IP portfolio with no less than five IP blocks including Arm Cortex-A78 CPU, Arm Mali-G78 and G68 GPUs, Arm Ethos-N78 neural processing unit, and the custom Cortex-X program starting with Cortex-X1, the most powerful Arm core to date. Arm Cortex-A78 CPU Cortex-A78 highlights: Architecture –  Armv8-A (Harvard) Extensions – Armv8.1, Armv8.2, Cryptography, and RAS; Armv8.3 (LDAPR instructions only) ISA support – A64, A32, and T32 (at EL0 only) Microarchitecture Pipeline – Out of order Superscalar Neon / Floating Point Unit Optional cryptography Unit Max number of CPUs in cluster – 4 Physical Addressing (PA) – 40-bit Memory system and external interfaces 32KB to 64KB L1 I-Cache / D-Cache 256KB to 512KB L2 Cache Optional 512KB to 4MB L3 Cache ECC and LPAE support Trustzone security Cortex-A78 delivers 20% extra performance compared to Cortex-A77 at the same power budget (one Watt), but peak […]

Panfrost Gets First 3D Renders on Bifrost GPU (Mali-G31) including Basic Texture Support

Collabora has been working on Panfrost open-source Arm Mali GPU driver for over a year. The drive aims to support both Midgard and Bifrost families. But so far, the company had mostly focused on Midgard (Mali-T6xx/T7xx) GPUs with for example experimental OpenGL ES 3.0 support announced last February. Collabora engineers, such as Alyssa Rosenzweig, have now started to work on Bifrost support, and some good progress has been made since they managed to have Panfrost render the first 3D graphics with basic texture support using a platform with an Arm Mali-G31 GPU. Alyssa notes that while Midgard and Bifrost have a similar command stream requiring a few changes, the Bifrost instruction set is completely different and required building a new compiler from scratch. This leads to changes to the Intermediate Representation (IR), 16-bit data support, a different register allocation mechanism due to adapt to irregular vector architectures, and the latter […]