Linux hardware video encoding on Amlogic A311D2 processor

I’ve spent a bit more time with Ubuntu 22.04 on Khadas VIM4 Amogic A311D2 SBC, and while the performance is generally good features like 3D graphics acceleration and hardware video decoding are missing. But I was pleased to see a Linux hardware video encoding section in the Wiki, as it’s not something we often see supported early on. So I’ve given it a try…

First, we need to make a video in NV12 pixel format that’s commonly outputted from cameras. I downloaded a 45-second 1080p H.264 sample video from Linaro, and converted it with ffmpeg:


I did this on my laptop. As a raw video, it’s pretty big with 3.3GB of storage used for a 45-second video:


Now let’s try to encode the video to H.264 on Khadas VIM4 board using aml_enc_test hardware video encoding sample:


The output explains the parameters used. There are some error messages, but the video can be played back with ffplay on my computer without issues.

Amlogic A311D2 H.264 video encoding sample

We can also see that encoding took place in 26 seconds, which is faster than real-time since the video is 45 seconds long.

Let’s try the same with H.265 encoding:


That’s surprising but H.265 video encoding is quite faster than H.264 video encoding. Let’s try H.264 encoding again:


Ah. It’s now taking less than 9 seconds. The first time it’s reading the data from the eMMC flash it is slow, but since the file is 3.3GB, it can fit into the cache so the second time there’s no bottleneck from storage.

amlogic a311d2 h265 hardware video encoding sample

Nevertheless, dump.h265 file could also play fine on my computer so the conversion was successful.

Amlogic A311D2 specifications say “H.265 & H.264 at 4Kp50” video encoding is supported. So let’s create a 45-second 4Kp50 video and convert it to NV12 YUV format. Oops, the size of the raw video is 27GB, and it won’t fit into the board’s eMMC flash… Let’s cut that to 30 seconds (about 18GB)…

Now we can encode the video to H.264:


Two minutes to encode a 30 seconds video! That does not cut it, so let’s run the sample again:


It’s even slower… I really think the storage is the bottleneck here because the required read speed for that file would be over 600 MB/s for real-time encoding. The system would typically encode video from the camera stream, not from the eMMC flash. I should have run iozone before:


The sequential read speed is about 178MB/s. I have a MINIX USB Hub with a 480GB SSD that I had tested at 400MB/s. Not quite what we need, but we should see an improvement.

Khadas VIM4 USB-C SSD

Sadly, the drive was not mounted, and even no recognized at all even with tools like fdisk and GParted. When double-checking Khadas VIM4 specifications, I realized the USB Type-C port was a USB 2.0 OTG interface that should recognize the drive, but only support 480 Mbps, so it’s a lost cause anyway…  The only way to achieve over 600MB/s would be to use a USB 3.0 NVMe SSD, but I don’t have any.

So instead, I’ll make a 5-second 4Kp50 video that’s about 2.9GB in size.

First run using H.265:


Second run:


One last try with H.264:


Not quite real-time, but it’s getting closer, and that means 4Kp30 should be feasible. That’s the result with a 5-second 4Kp30 NV12 video encoded with H.264:


Less than four seconds. So real-time 4Kp30 H.264 hardware video encoding is definitely working on Amlogic A311D2 processor.

Amlogic A311D2 4Kp30 hardware video encoding

It’s playing fine on my PC too.

It’s also possible to encode NV12 YUV images into JPEG, but it won’t work with khadas user:


But no problem with sudo:


Probably just a simple permission issue. it was performed the task in 44ms, and I can open dump.jpg (a screenshot) without issues.

jpeg hardware encoding Amlogic A311D2

If I use ffmpeg to convert the NV12 file to jpeg, presumably with software encoding, it takes just under 200ms:


aml_enc_test and jpeg_enc_test are nice little utilities to test hardware video/image encoding in Linux on Amlogic A311D2, but the source code would be nice in order to integrate this into an application. But it does not appear to be public at this time, so I’d assume it’s part of Amlogic SDK. I’ll ask Khadas for the source code, or the method to get it.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX Rockchip RK3588 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
9 Comments
oldest
newest
tkaiser
tkaiser
2 years ago

> the second time there’s no bottleneck from storage.

That’s why I always run sbc-bench -m (monitoring mode) in parallel with such tests. To see which kind of task the system is spending time on (your 1st run an awful lot of %iowait for sure).

When switching to performance governor on all CPU clusters running iostat 5 instead consumes less resources. Though unadjusted cpufreq governor might be more interesting -> maybe low(er) CPU clockspeeds due to VPU busy and similar…

WereCatf
WereCatf
2 years ago

I’m not even surprised that one can’t use any of the well-known frameworks for utilizing hw-encoding/-decoding, whether it is OMX, VA-API, V4L2 or similar and instead would have to write device-specific software using Amlogic’s SDK.

Want to use some popular, already-existing open-source software? Nope, gotta fork it and (try to!) modify it to work with Amlogic’s libraries!

tkaiser
tkaiser
2 years ago

Software situation around media capabilities with Amlogic’s forward ported 5.4 mess explained more in detail: https://forum.khadas.com/t/khadas-vim4-is-coming-soon/15266/42?u=tkaiser

Not too surprising since the typical Linux userland has zero relevance for the ‘Android e-waste’ world…

tkaiser
tkaiser
2 years ago

BTW: if revisiting this topic it would be interesting to check SoC thermals while testing since Amlogic’s BSP kernel exposes half a dozen thermal sensors for A311D2: find /sys -name “*thermal”

With lm-sensors package installed this will work too ofc: while true ; do sensors; sleep 10; done

But exploring /sys a bit might be worth the efforts (clockspeeds/governors of memory, gpu, vpu and such things)

animtaknet
animtaknet
2 years ago

Would it help to run these test with the media files on a ramdisk/tmpfs forceing them to be in ram regardless of any caching?

tkaiser
tkaiser
2 years ago

Nope for the following simple reasons:

  • in passive benchmarking mode you need to repeat each test at least 3 times (since passive benchmarking means you’ve no idea what you’re actually doing)
  • you always need to monitor the benchmark environment
  • Linux filesystem caches/buffers work fine since over a decade so the 2nd run will show the problem
Anders Kirchenbauer
2 years ago

The first time I tried doing YUV encoding for a video codec I was using a PowerMac G5 and a brand new firewire 800 external disk drive. I was taking an 854×480 NTSC MPEG2 stream from a DVD, and writing it to the hard drive at the same time I was reading the YUV (YUV4MPEG) file back in with the encoder software and writing the encoded file to the same external HDD. The drive had a physical failure in under 2 hours and was forevermore unusable. I switched to two different approaches. 1) mkfifo will make a file that will… Read more »

alex
alex
2 years ago

i pay $199 only if i get the encoding and decoding source code otherwise i keep my rk3568 which has reasonable encoding performance.

m][sko
2 years ago

GOP should be something like 250 or 300 for better compression. It is how often do you want your key frames( frame without any compression)

Boardcon Rockchip RK3588S SBC with 8K, WiFI 6, 4G LTE, NVME SSD, HDMI 2.1...