Combining OSTree and SW Containers for reliable IoT Device updates

CNXSoft: This is a guest post by Drew Moseley, Technical Solutions Architect at Toradex, explaining how the company updates Linux IoT devices firmware with OSTree (aka libostree) open-source operating system build and deployment tool, as well as Docker software containers.

Every day more and more connected devices are being brought to market and estimates for the total size of the Internet of Things (IoT) market are as high as $1.5 trillion by 2027. Gas pumps, medical devices, and point of sale systems are increasingly connected, making it virtually impossible to avoid interacting with these devices, even for complete Luddites. In the home, devices such as power meters, light switches, and security cameras are commonly internet-enabled allowing for smart home functionality.

The level of complexity in the software for these devices increases with the functionality, and the number of devices with software defects in the field is growing. In many cases, these systems are designed, produced, and shipped without any consideration given to providing software updates beyond the initial program load. That’s a serious problem, and it can extend far beyond causing problems for the owner of the device, or adding warranty or recall expense for the manufacturer. In many cases, IoT devices can be aggregated into large IoT botnets that, due to large numbers, have been used for large scale attacks on critical pieces of infrastructure, such as the Distributed Denial of Service attack (DDOS) against the Dyn Domain Name Service (DNS) provider, resulting in service interruptions for large organizations such as Twitter and Github.

Why update devices?

The most obvious reason to provide software updates to devices in the field is to address software vulnerabilities. Not all vulnerabilities become exploits resulting in large-scale attacks as mentioned above, but the risk to your brand is significant. And with more and more devices in your user’s homes, it may be possible to chain vulnerabilities from multiple devices together to get broader access to your users’ data. In one memorable incident, an unnamed casino had its high-roller database breached through a vulnerability in an internet-enabled fish tank thermometer. There are approximately 14 million lines of code in the 787 Dreamliner jet (likely limited to the avionics system, and not including things such as the in-flight entertainment systems), compared with about 28 million lines of code in just the Linux kernel (as of January 2020). Keep in mind that the Linux kernel is only one part of a Linux system, so you can start to get a feel for the scale of the problem. These many lines of code will undoubtedly contain many errors needing fixes throughout your product’s lifetime.

Providing an update capability for your devices also enables you to deliver new features to your users. Depending on your business model this can be helpful for long-term customer retention or simply for providing up-sell capabilities and increasing revenue. Given the benefits of update capabilities, you may wonder why any device would be shipped without this. I struggle to define a use case where software updates would be completely unneeded.

OTA Server

Any fully automated OTA update solution requires a server that manages the fleet and allows your operations staff to manage the devices. Discussing the server-side in depth is out of scope for this article but there are numerous options available. In general, you will want to pick an end-to-end solution meaning both the update server and the update clients have been developed to be a full solution, or at the very least that the combination of server and client have been well tested and integrated with each other.

Update methods

There are a few common methods to allow for software updates.

  • In-place package-based updates: this is the mechanism used by most desktop Operating Systems. Basically, an installer application or packaging system is run in the currently active OS image. This can install anything needed by the system but it may be difficult to ensure all the devices in your fleet are running the exact same binaries that you have tested in your design labs.
  • Asymmetric image updates: this method generally uses a separate installer partition that is able to download the appropriate images and overwrite the primary OS partition. This eliminates the concern of partially installed package sets that happens with in-place updates but can result in long downtimes for your users. This is the method that, until recently, was employed by most mobile phone updates, and I’m sure we have all been annoyed by the amount of time these updates take.
  • Symmetric image updates (commonly called dual A/B updates): this method uses fully redundant partitions containing an active and a passive partition. While running in the active partition, the update client can download and install a full image into the passive partition. Since this can all happen in the background while your application code is active, it removes the downtime concerns that come with asymmetric images. However since it uses fully redundant partitions, it generally takes more block device storage than the other methods.
  • OSTree based updates: This is the subject of the next section and provides a good mix of features, allowing for minimal device downtime, and not requiring extra storage to house the redundant partitions.

OSTree

The documentation for the OSTree project defines it as follows:

libostree is both a shared library and suite of command line tools that combines a “git-like” model for committing and downloading bootable filesystem trees, along with a layer for deploying them and managing the bootloader configuration.

This is a bit nebulous so let’s work through an example. First, we will create an empty repository in a subdirectory on our development workstation. OSTree is normally used for entire filesystems but for simplicity, we will use a directory for this example.


We have initialized an empty repository. You can see that it has created a number of empty directories and a single config file. The repository metadata has much in common with git. You see familiar directories such as refs/heads which are used similarly. Let’s now add a file to the repository:


We see that many new objects have been created. The .dirmeta, .dirtree, and .commit files are metadata that track the file and directory metadata (permissions, ownership, etc), directory tree structure, and commit metadata respectively. The file refs/heads/main file contains the commit hash for the new commit:


Note also that the object created with the .file extension is identical to the file we created. This is commonly called content-addressable storage which simply means that the files in the object store are named based on their content. The name of the object (in this case 92/d6c7afcaedabd4504d2e16de3ffc200cd156ac733306cf7b8991f56859bcd5.file) is generated from the sha256sum of the file itself, as well as the file attributes.


It is important to note that these files are actually hard links to the same filesystem blocks. This is an important principle of OStree and shows that it will be very space-efficient; any files that are unchanged between versions will not be duplicated, resulting in significant space savings, both for block storage on the device, as well as for download bandwidth when pulling new revisions.


Just as with git, we can remove the file and then check it back out from the repository.


Now let’s add a full root filesystem to the repository. I created a simple filesystem for a QEMU Arm device using Buildroot.


Now we will make a second version of the filesystem. I’ve used the previous Buildroot configuration and added the bc utility which is listed under Target packages\Miscellaneous when running make menuconfig.


Now, let’s say we decide we no longer want the bc binary. Without rebuilding we can simply roll back to the previous release. First, we check that bc exists in our current filesystem; then we rollback; finally, we verify that bc is once again a symlink to busybox:


The last feature that is important for an over-the-air update system is remote repositories. Similar to how git uses repositories, these are remotely accessible data stores containing OSTree metadata. This example is run on a Toradex Verdin i.MX8M Mini system running Torizon which is an industrial embedded Linux system based on OSTree. We connect it to the Toradex TorizonCore OSTree repository and check out the latest nightly release. There are additional details, not covered here, related to switching to the new version of the filesystem on boot so that it is an atomic operation.


With the set of features shown by OSTree, we have the basic features needed for an OTA update system:

  • The capability of storing multiple versions of the entire filesystem.
  • Retention of the older versions is used to provide a robust rollback facility.
  • Hard links are used to optimize storage space.
  • Remote repositories allow connected devices to download updates over the “air”.

OSTree IoT devices

Containers

Containers are a form of OS Level Virtualization, allowing for isolated environments to run applications. They differ from Virtual Machines in that they do not virtualize the entire hardware platform and do not run a full Operating System. Their primary use case is to encapsulate an application with all the dependencies, libraries, etc, it needs to operate. Setting up an application in a container allows you to ensure that all the dependencies are met without having to install additional packages into the base OS. For example, if you are running a NodeJS application, you will package it into a container with the JavaScript runtime and all other components needed. You can specify the versions of each of the dependencies, test everything together, and then deploy the exact combination; this removes the worry that the base OS image might have a different version of a specific package. Additionally, it allows different containers to contain different versions of components if needed; for instance, you can run one package running a python application with Python v2, and another container with a package that needs Python v3.

In addition to the dependency management, containers can isolation from other components of the system, potentially increasing security. Using standard features of the base OS kernel, containers can be limited to only certain parts of the filesystem, certain devices, and even limited to a specific CPU in a multicore system. Depending on the container runtime you are using, you may be able to limit the overall CPU or memory usage of individual containers allowing your system to adjust based on usage patterns.

The third main feature of container systems is their built-in delivery mechanism. Using docker, one of the most popular container engines, you can create new containers that inherit functionality from many base images that are provided by various software providers. If you have an application written in Python v3 and you want to run it in a Debian style environment, you can create a Dockerfile that looks like the following:


You then create the myapp.py file in the same directory containing your application. We will use the following as our test app:


You can now build and run this container directly on your build system with the following steps:


The first command builds the image and tags with the name myapp and the revision latest. The second command runs the container with the start command as specified in Dockerfile with the CMD statement. Note that we are explicitly not running on our embedded device at this point. You can use containers to do a lot of development and troubleshooting on your desktop. For many application development tasks, this is more efficient than doing development directly on the embedded device. When you are ready to run the container on your embedded device, you can copy the entire working directory to your device and rerun the above commands. This will work when you are testing or dealing with a small number of devices, however as your fleet size increases, you need a better delivery mechanism. Docker provides a convenient mechanism for sharing images. We have already used this when we specified the FROM statement in our Dockerfile. This instructs docker to base our image on the python:buster image that is available on the docker hub. You can also push your images to the docker hub or any other docker repository, which is useful when you want to keep your containers private. Once you have created a docker hub account, you can create your custom image for the target architecture and publish it using the following:


Since we created versions for Arm32, Arm64, and AMD64, you can run your image from any system based on those architectures as follows:


The combination of a flexible, familiar software environment, as well as the built-in packaging and delivery mechanisms, make a compelling case for using containers as your application deployment environment.

OTA Updater

The combination of OStree and containers provides a rich feature set from which we can develop full OTA update capabilities. Taken together, these two projects deliver a powerful system, capable of handling the needs of the connected devices being developed today.

As discussed above, OSTree provides:

  • A stable, power-tolerant mechanism for managing your Operating System binaries.
  • Efficient use of storage and download bandwidth, with the ability to reuse any unmodified files.
  • Rollback functionality to ensure a broken update does not render your fleet lifeless.

Using containers to house your application stack provides:

  • A flexible and familiar software development environment. Your developers can continue to use the tools they are already familiar with in their desktop Linux systems.
  • Built-in packaging and delivery. We don’t need to reinvent the wheel here and can take advantage of industry-proven solutions.
  • Active developer community. The amount of documentation, blog posts, and ready-made containers available for reuse is vast and can be used as a starting point for your development efforts.

Using such a system allows for updating any component in the system, including the kernel, device tree, and application code. Deployed properly, you can ensure the health of your device fleet throughout its lifetime.

For the update capability to be the most useful, it needs to be automatic, unattended, and available over a network connection of some kind; commonly called over-the-air (OTA) but this does not necessarily imply a wireless connection. Users of these devices do not think of them as computers requiring maintenance, but rather they think of them as appliances that should “just work”. If updating requires user intervention, then you are likely to have many out-of-date devices in your fleet. There are use cases, such as medical devices, where the connectivity of the devices may be deliberately restricted but even in these cases the update should be automated as much as is feasible such as when the device is connected to its docking station for charging.

Security

While security is not the point of this article, we would be remiss if we didn’t discuss it at least a bit. One of the biggest threats to any software system is the ability to run arbitrary code. Since the whole point of an OTA update system is to get new code installed and running on a system, extreme care must be taken to ensure that the code being installed has not been tampered with and is the expected image for your system.

Consider the following points:

  • Physical Security: You likely have control over the server infrastructure so make sure you lock down physical access as much as feasible. Note however that it will not likely be possible for the client devices.
  • Transport Encryption: You must ensure that the transport between the client and server is properly encrypted. Ideally, you will use proper TLS certificate verification on both endpoints to ensure that you are talking to the expected device.
  • Image Verification: Your client devices need a mechanism to validate the images being installed. Cryptographic validation should be used to protect against arbitrary software installation.
  • Security Key Management: Any security architecture will rely on keys of some kind. The architecture should provide mechanisms to expire old keys and rotate in new keys, as well as providing appropriate protection of the keys.

There are open-source frameworks that provide extremely secure designs that you can use in implementing your OTA update system. The Update Framework and Uptane are two notable projects that you should consider if you need to design a custom update system.

There are numerous open-source projects that implement OTA update systems that you can integrate into your design to avoid the overhead and risk of designing your system. The Torizon Platform is the project I am currently involved with and implements the full OTA system as described in this post. OStree provides limited security features such as cryptographically signing commit and delta objects. Torizon is based on the Uptane architecture providing for a ready-built highly secure end-to-end OTA update solution.

Conclusion

We have discussed how to combine several Open Source projects as infrastructure to create a fully automated, end-to-end OTA update solution.

The use of OSTree for our primary operating system storage allows for a very space-efficient solution and does not require the use of fully redundant partitions. It provides atomic, transactional updates resulting in minimal downtime for device users. OSTree has been carefully designed to be resilient to unpredictable power cycles and allows for rollback when issues are detected with an update.

The use of containers for the application stack provides a convenient packaging and delivery mechanism that can be handled independently of the base operating system. Containers are relatively easy to use and many developers already have skills in working with them. You can choose a base image that matches your desktop Linux distribution which will allow you to work in a familiar environment with a rich set of tools at your disposal. Or you can choose a container-optimized base image (such as Alpine Linux) that is designed to be small, and inherently more secure; after all the most secure software is that which is not installed

Combining containers and OSTree gives the best of all worlds when considering build reproducibility, maintainability, and flexibility for your developers. Using a system, such as Torizon, provides a ready-made solution that uses the architecture described here. This allows you to quickly get started developing your application without worrying about the details of OSTree, containers, and OTA updates, while safe in the knowledge that you have a solid solution for managing the lifetime of your device fleet.

Providing proper updates to your devices should be considered a must-have for any modern connected device design. The risk to your users and your brand is too great to overlook.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX Rockchip RK3588 mini-ITX motherboard
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
1 Comment
oldest
newest
Fossxplorer
Fossxplorer
3 years ago

Very good and interesting article with more in-depth knowledge for ppl working with immutable OSes (FCOS, RHCOS)

Boardcon Rockchip RK3588S SBC with 8K, WiFI 6, 4G LTE, NVME SSD, HDMI 2.1...