With the increasing need for video encoding, there are some breakthrough developments in hardware-accelerated video encoding for Linux. Bootlin has been working on the implementation of Hantro H1 hardware accelerated video encoding to support H.264 encoding on Linux which follows the company’s work on the previously-released open-source VPU driver for Allwinner processors.
Hantro H1 is a common hardware H.264 encoder, it can also do VP8 and JPEG. It is found in a few ARM SoCs including a lot of Rockchip (RK3288, RK3328, RK3399, PX30, RK1808) and NXP (i.MX 8M Mini). Depending on the version, it can support up to 1080p at 30 or 60 fps.
Here we can see different blocks used for encoding. Hantro H1 is a stateless hardware implementation which means it has no microcontroller or firmware running. As can be seen in the diagram, it has a pre-processor that can do things like cropping, rotation, scaling, stabilization, and CSC. It will produce slice NALUs which means no meta-data. There are also some constraints on specific parameters. It only supports I and P slices (no B slices) for embedded encoding. It will take references from P slices that need to be stored in specific reconstruction buffers. Hantro H1 also has internal rate control mechanisms.
Some of the acronyms above may require some explanations. H.264 has semantics and specific units which are called Network Abstraction Layer Units (NALUs). Each of them has a data header with a specific type that indicates the data inside each unit. For example, there is meta-data in NALUs such as the Sequence Parameter Set (SPS). So, this is meta-data for the whole sequence.
Basically, meta-data is just a series of bits that needs to be understood according to the syntax of H.264. Similarly, there is also a Picture Parameter Set (PPS) for each picture. These are for meta-data, likewise, for coded data, we have coded slice data. There is a specific format called Annex-B which will put the prefix before the beginning of each NALU. This makes it easier to find the start of the next one.
Macroblocks are subdivisions of pictures into blocks of 16×16 pixels. These are grouped as slices. So basically the coded information in the coded slice data in NALU will be composed of data around macroblocks.
For Hantro H1, a lot of parameters can be set which control the encoding. Some of them need to be set to a specific value for the bitstream to be valid. In the stateless case, the state is tracked by V4L2 driver and userspace code. Bootlin is using the media request API just like in stateless decoding to tie the buffers and the parameters. The reconstruction buffers must also be taken care of, as they need to be attached to the input buffers. The current rate control in stateless cases can be implemented by the V4L2 driver and userspace which will read the feedback data itself.
There is already some existing Hantro H1 code, for example on Chromium OS it is used with a downstream kernel driver that is pretty much stateless. There’s also a user-space implementation which is available as libv412plugins implementation. Also, the Rockchip has its own implementation of its own downstream kernel which is called MPP supports Hantro H1.
So, the approach that was taken to support this encoder was to use a mainline hantro driver that supports decoding with Hantro G1. Bootlin then added support for H.264 Hantro H1 encoding to the driver. The kernel implementation along with the userspace side is available on GitHub.
Since the first implementation, there are some issues that cannot be accepted as is, but this gives some ideas to work for a proper interface for status encoders. Future work will also involve a clean up of the Hantro encoding code and bring it upstream with a nice interface that might also affect other hardware like the Allwinner encoder, which is also stateless.