Linaro Connect San Diego 2019 Schedule - IoT, AI, Optimizations, Compilers and More

Linaro has recently released the full schedule of Linaro Connect San Diego 2019 that will take place on September 23-27. Even if you can’t attend, it’s always interested to check out the schedule to find out what interesting work is done on Arm Linux, Zephyr OS, and so on.

So I’ve created my own virtual schedule with some of the most relevant and interesting sessions of the five-day event.

Monday, September 23

14:00 – 14:25 – SAN19-101 Thermal Governors: How to pick the right one by Keerthy Jagadeesh, Software Engineer, Texas Instruments

With higher Gigahertz and multiple cores packed in a SoC the need for thermal management for Arm based SoCs gets more and more critical. Thermal governors that define the policy for thermal management play a pivotal role in ensuring thermal safety of the device. Choosing the right one ensures the device performs optimally with in the thermal budget.

In this presentation Keerthy Jagadeesh, co-maintainer of TI BANDGAP AND THERMAL DRIVER explores the behavior of existing governors like step_wise, fair_share, bang_bang governors on A15 based DRA7 SoCs as an example. Governors perform differently based on the Number of Cores the SoC packs, the process node and the use cases. The results on DRA7 family of SoCs will be used to provide guide lines while choosing a particular thermal governor for a given SoC based on the above mentioned parameters.

14:30 – 14:55 – SAN19-106 What’s new in VIXL 2019? by Tat Wai Chong, Senior Software Engineer, Arm

VIXL is a ARMv8 Runtime Code Generation Library which contains three components:

- Programmatic assemblers to generate A64, A32 or T32 code at runtime.
- Disassemblers that can print any instruction emitted by the assemblers.
- Simulator can simulate any instruction emitted by the A64 assembler on
- x86 and ARM platform. It is configurable, vector length for SVE, for example, and it supports register tracing during the execution.

In this talk, we’re going to introduce:

- What is VIXL? It is already deployed and is considered “mature”, for example, it has been adopted by Android ART compiler for its ARM backends: AArch64 and AArch32.
- CPU feature management and detection.
- New Armv8.x instructions support, e.g. BTI, PAuth, etc.
- New SVE (Scalable Vector Extension) support.

15:00 – 15:25 – SAN19-109 Device Tree Evolution Project by Bill Fletcher, Field Engineering, Linaro

Device Tree (DT) is a core technology that enables us to build flexible and adaptable embedded systems. Device Tree engineering work is occurring in various forums, but there are a number of features that are important to the ecosystem but are languishing due to little focus or coordination.

Several topics have been identified as critical features that require leadership and engineering effort. This is a collaboration project to put some coordinated engineering effort into the identified features.

The session will introduce the project and the identified topics.

15:30 – 15:55 – SAN19-114 Holistic Audio Solution for Modern Embedded Devices by Patrick Lai, Qualcomm Senior Staff Engineer

Audio is ubiquitous across a wide range of phone, tablet, notebook, speaker, headset, appliance, router, telematic, and other modern devices.

To fully utilize the potential of dedicated audio subsystems on SoCs and other embedded devices with minimal engineering investment, an open, modularized, and extensible signal processing framework, with associated uniform APIs, is proposed for high performance or cost sensitive and power efficient audio applications.

An extensible and modularized framework provides the flexibility, within the same architecture and codebase, to support a wide range of customizable features and capabilities without sacrificing the ability to scale up to higher performing, richer processing environments and also scale down to smaller, resource limited and cost sensitive environments.

This presentation describes the key design principles of this advanced audio signal processing framework including programming interfaces and development workflow using open source community friendly SDK’s and use case design, configuration, and tuning applications.

16:00 – 16:25 – SAN19-116 Project Zephyr Security Update by David Brown, Senior Engineer, Linaro

This talk will give an overview of the work by the security subcommittee within the Zephyr project, including the current status of security within the project. It will cover what happens when a vulnerability is reported, as well as ongoing efforts around static analysis.

16:30pm – 16:55 – SAN19-121 TF-M remote secure services with Zephyr by Karl Zhang, Senior Software Engineer, arm

Trusted Firmware M (TF-M) is an open source implementation of Platform Security Architecture (PSA) for Arm Cortex M processors. TF-M provides secure services to other cores or non-secure execution environments using PSA APIs on the M profile core. It includes services like secure storage, security audit trails, and crypto, amongst others. PSA Firmware Framework (PSA-FF) compliant APIs are used for inter-process or inter-processor communication with the secure services.

This session will discuss how to run Zephyr on a non-secure core, calling TF-M services on a secure TF-M core. A dual-core Cortex M33 will be used, with OpenAMP as the IPC protocol between the Zephyr and TF-M core. This session will also examine PSA level 1 requirements for PSA certification, such as the use of a secure boot loader.

Tuesday, September 24

8:30 – 8:55 – SAN19-201 Bring Kubernetes to the Arm64 edge node by K3s by Kevin Zhao, Tech Lead, Linaro

Nowadays everyone talks about Kubernetes. There are a lot of landing scenarios about running Kubernetes and it’s very easy for deploy applications with Kubernetes. However, due to the limited resource capacity on the edge node, deploying a total Kubernetes cluster on the edge node will result in huge resource costs. Is there an easy way to bring Kubernetes to the edge node with less resources?

Using K3s on edge node offers a vital alternative. k3s is a lightweight Kubernetes distribution with easy installation, half the memory and all in a binary, which is obviously designed for edge and IOT devices based on Arm64.

For easy management of several edge K3s clusters, we’ve also running a Kubernetes cluster on Arm64 datacenter side as the “root cluster” for meta-data management and provisioner for all k3s clusters running on the edge nodes.

In this presentation, we will talk about how to run k3s on the Arm64 edge node, what we have done to make Kubernetes cluster running on datacenter side as the root cluster to manage the several k3s clusters on edge Arm64 nodes, that will be a good reference architecture for running and managing workloads at edge computing area.

9:00 – 9:25 – SAN19-205 Boost JVM apps by using GPU by Dmitry Chuyko, JVM Engineer

Today JVM remains one of the most popular programming and execution platforms. There are different approaches to leverage GPU power from the JVM, which can be useful for many specific cases. ARM-based hardware brings JVM benefits on the edge. This talk will demonstrate different ways of interoperability between GPU and JVM. We will evaluate the APIs and the performance of hybrid Java-GPU code. For the practical part of the talk, we will use Jetson Nano as an example of modern, powerful, but affordable edge equipment.

11:00 – 11:50 – SAN19-210 Azure Sphere: Fitting Linux security into 4 MiB of RAM by Ryan Fairfax, Principal Engineering Lead, Microsoft

Azure Sphere is a solution for building highly secured, connected microcontroller-powered devices. It includes a customized version of the Linux kernel and work to fit the OS within a highly constrained memory footprint. In this talk we will cover the security components of the system, including a custom Linux Security Module, modifications and extensions to existing kernel components, and user space components that form the security backbone of the OS. Along the way we’ll discuss false starts, failed attempts, and the challenges of taking modern security techniques and fitting them in resource constrained devices.

11:30 – 11:55 – SAN19-211 ONNX & ONNX Runtime by Weixing Zhang, Microsoft, Senior Software Engineer

Microsoft and a community of partners created ONNX as an open standard for representing machine learning models. Models from many frameworks including TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet, and MATLAB can be exported or converted to the standard ONNX format. Once the models are in the ONNX format, they can be run on a variety of platforms and devices.

ONNX Runtime is a high-performance inference engine for deploying ONNX models to production. It’s optimized for both cloud and edge and works on Linux, Windows, and Mac. Written in C++, it also has C, Python, and C# APIs. ONNX Runtime provides support for all of the ONNX-ML specification and also integrates with accelerators on different hardware such as TensorRT on NVidia GPUs.

The ONNX Runtime is used in high scale Microsoft services such as Bing, Office, and Cognitive Services. Performance gains are dependent on a number of factors but these Microsoft services have seen an average 2x performance gain on CPU. ONNX Runtime is also used as part of Windows ML on hundreds of millions of devices. You can use the runtime with Azure Machine Learning services. By using ONNX Runtime, you can benefit from the extensive production-grade optimizations, testing, and ongoing improvements.

12:00 – 12:25 – SAN19-215 AI Benchmarks and IoT by Mark Charlebois, Director Engineering, Qualcomm Technologies Inc

There are several mobile and server AI benchmarks in use today and some new ones on the horizon. Which of these or others are applicable to IoT use cases? How do you meaningfully compare AI performance across the wide range of IoT HW with widely varying cost, memory, power and thermal constraints, and accuracy tradeoffs for quantized models vs non-quantized models? This talk will discuss these topics and some of the possible ways to address the issues.

12:30 – 12:55 – SAN19-218 Inference Engine Deployment on MCUs or Application Processors by Markus Levy, Director of ML Technologies, NXP Semiconductor

This session will describe how to apply Arm NN, CMSIS-NN, and GLOW to translate neural networks to inference engines running on MCUs or Application Processors.

14:00 – 14:25 – SAN19-219 Upstreaming ARM64 SoC’s easier than before by
Manivannan Sadhasivam, Applications Engineer, Linaro

This session is aimed towards providing an overview of upstreaming ARM64 SoC’s in Linux kernel

15:00 – 15:50 – SAN19-221 Gcc under the hood by Siddhesh Poyarekar, Tech Lead, Linaro

This session is a beginner tutorial that explores under the hood of the gcc compiler. In the process we take a look at some useful methods that allow developers to understand how gcc transforms their code into the target machine code.

16:00 – 16:50 – SAN19-222 Linux Kernel Mailbox API – 101 by Jassi Brar, Principal Engineer, Linaro

Introductory presentation about the concept of Mailbox, some common use-cases and features and limitations of the Mailbox API in Linux kernel.

17:00 – 17:50 – SAN19-223 Using Perf and its friend eBPF on Arm platform by Leo Yan, software engineer, Linaro

Perf has joined a growing number of tools able to act as userspace interface to eBPF. Not only that but it can also reprise its historic role as the best interface to the Linux performance monitoring sub-system to profile eBPF programs installed by itself or any other eBPF front end.

This session will mainly give updates for latest support for Arm CoreSight and eBPF in perf, and also will see how to Arm CoreSight can be used for eBPF program profiling.

This session will be divided into two main parts; the first part will focus on the updates for Arm CoreSight tracing, includes sample flags and integration perf for test support. In the bottom part, it will discuss the eBPF usage with perf tool: perf uses eBPF program for system call tracing; perf profiles the eBPF program by using the general PMU events and Arm CoreSight event.

This session will be finished within 25 minutes; in the CoreSight hacking session we will concentrate on questions and demonstrations as the supplement to the presentation.

Wednesday, September 25

11:00 – 11:25 – SAN19-305 The Transformation of Electronic Product Design by Gordon Kruberg, Dream, Design, Deliver

Dr. Kruberg will review and predict the future impact of modular software on the Arm ecosystem and cloud based electronic design and manufacturing of next generation electronics.

12:00 – 12:25 – SAN19-307 Robotic Arm Control using Qualcomm RB3 by Sahaj Sarup, Application Engineer, Linaro

Discussion and showcase around the currently WIP “Robotic Arm Project” using the RB3 Robotics kit from Qualcomm.

The following topics will be covered:

- Servo control
- Object recognition using OpenCV
- Basic voice control

12:30 – 12:55 – SAN19-311 TVM – An End to End Deep Learning Compiler Stack by Animesh Jain, Applied Scientist, AWS

AWS is a leading cloud-service provider with the goal of providing the best customer experience. ARM has a unique place in the whole ecosystem – both at server and edge devices. In this talk, I will explain how AWS Sagemaker Neo accelerates deep learning on EC2 ARM A1 instances and ARM-based edge devices to improve customer experience. AWS Sagemaker Neo uses TVM, an open-source end-to-end deep learning compiler stack.

14:00 – 14:50 – SAN19-312 Arm Everywhere: A Demo of an Arm Cloud, Edge, and IoT Infrastructure by David Tischler, Founder, miniNodes

The past 6 months have seen much written on the topic of bringing workloads back from the Cloud, and moving them to the Edge, closer to the end users or to IoT endpoints, and improving the service delivery experience. While there have been many articles, slides, headlines, and conversations about this, no one has yet to demonstrate a full end-to-end working Arm-based implementation. miniNodes is building a complete demonstration of connected Cloud Servers, Edge Servers, and IoT Devices, running entirely on Arm. Environmental data will be captured by IoT endpoints running Arm Mbed, provisioned via Arm Pelion, feeding data to Edge servers, that will in turn connect to an Ampere eMAG server hosted by Packet.com.

More specifically, the IoT endpoints are collecting environmental readings such as temperature, humidity, air quality, particulates, and lightning detection from a series of Raspberry Pi’s distributed across the globe.

Regional 96Boards Edge Servers are collecting data from the IoT endpoints in their assigned zone, and packaging the data for shipment to the cloud server.

An Ampere Cloud server hosted by Packet is doing the large data processing activities, and running Grafana Dashboard for visualization of the IoT data flowing into the system.

The entire collection of systems will be centrally managed and the IoT nodes allow for service provisioning via containers pushed to the devices, no matter their location, due to the Pelion application.

15:00 – 15:25 – SAN19-313 Using Python Overlays to Experiment with Neural Networks by Bryan Fletcher, Technical Marketing Director, Avnet

Python Productivity for Zynq, or PYNQ, has the ability to present programmable logic circuits as hardware libraries called overlays. These overlays are analogous to software libraries. A software engineer can select the overlay that best matches their application. The overlay can be accessed through an application programming interface (API). Using existing community overlays, this course will examine how to experiment with neural networks using PYNQ on Ultra96.

16:00 – 16:50 – SAN19-314 Developing with PetaLinux for the Ultra96 Board by Tom Curran, Avnet, Sr. Technical Marketing Engineer

This course will describe Linux development using the Xilinx PetaLinux tools for the Ultra96 board. Specific focus will be given to lessons learned in integrating and debugging device drivers.

Thursday, September 26

8:30 – 8:55 – SAN19-403 Code size improvement work in TCWG by Peter Smith, Principal Engineer, Arm

For many projects that use resource constrained devices, optimizing for the smallest code-size is often more important than optimizing for the highest performance. The TCWG team would like to share their progress and results on several code-size related projects. These include:

- Comparing the code-size of clang and gcc for bare metal programs on M-profile devices.
- Adding Arm support to the LLVM machine outliner.
- Adding C++ virtual function elimination to Clang.
- Building zephyr using GCC LTO (Link Time Optimization).

The presentation will give a brief summary of how the clang and gcc compilers compare on code-size, and a description of some improvements you can expect in future versions of the compilers.

9:00 – 9:25 – SAN19-406 Secure Runtime Library on IoT Device by Ken Liu, Staff Software Engineer, Arm

While isolation levels greater than 1 are involved in PSA certificate, the existing runtime library for secure partition lacks security consideration and contains its own private data, this prevents secure partition calling these APIs because of potential information leakage.

A new runtime library needs to be available for secure partition with security consideration at the very start of design. The design should not break the isolation requirements listed in the PSA Firmware Framework specification. This runtime library also needs to be sharable for all secure partitions to save storage on IoT device, and it needs to be read-only to avoid tampering. And the most important part, no private data could exist inside of runtime library.

This new runtime library would keep security isolation consideration out of secure partition designers, which make the development environment unified for secure partition developers. And save the size for IoT software since this library is shared.

11:00 – 11:25 – SAN19-408 Performance improvements in Open Source C/C++ Toolchains for Arm by James Greenhalgh, Principal Engineer, Arm

Performance optimizations underpin great advances in the system efficiency of Arm-based devices, with C and C++ toolchains at the heart of code-generation technology for the Arm architecture. In this session I will give an overview of the work of the C/C++ compiler performance team at Arm, and discuss our recent successes and priorities for the coming year.

11:30 – 11:55 – SAN19-413 TEE based Trusted Keys in Linux by Sumit Garg, Software Engineer, Linaro

Protecting key confidentiality is essential for many kernel security use-cases such as disk encryption, file encryption and protecting the integrity of file metadata. Trusted and encrypted keys provides a mechanism to export keys to user-space for storage as an encrypted blob and for the user-space to later reload them onto Linux keyring without the user-space knowing the encryption key. The existing Trusted Keys implementation relied on a TPM device but what if you are working on a system without one?

This session will introduce a Trusted Keys implementation which relies on a much simpler trusted application running in a Trusted Execution Environment (TEE) for sealing and unsealing of Trusted Keys using a hardware unique key provided by the TEE.

12:00 – 12:25 – SAN19-416 Transforming kernel developer workflows with CI/CD by Major Hayden, Principal Software Engineer, Red Hat

26 million lines of code. 750,000 commits. 61,000 files. “Continuous integration and deployment of the Linux kernel is impossible”, they said. We believe it’s definitely within reach.

The Continuous Kernel Integration (CKI) project wants to fundamentally change the kernel developer workflow by adding continuous integration and continuous deployment (CI/CD). In this talk, the audience will embark on a journey of triumph and tragedy through the experiences of a small team at Red Hat.

Major Hayden, principal software engineer at Red Hat, will explain how kernels are built and tested within the CKI infrastructure and what testing is already in place today. He will take a deep dive into the infrastructure components (including Gitlab, Jenkins, and containers) and the optimizations that allow for rapid testing of the Linux kernel. Members of the audience will also learn how they can get involved in the project.

12:30 – 12:55 – SAN19-419 Why you should use the SCHED_IDLE CFS scheduling policy by Viresh Kumar, Engineer, Linaro

CFS scheduler has multiple policies, and SCHED_IDLE is one of them. Due to some recent optimizations included in scheduler around sched-idle, it has become an interesting policy and must be used by users going forward.

14:00 – 14:50 – SAN19-421 Training: Device power management for idle by Ulf Hansson, Senior Kernel Engineer, Linaro

Arm platforms often supports sophisticated power management, to for example allow unused parts on a running system, to be put into low power states, which prevents energy from being drained.

However, it can be a rather complicated task to deploy optimized power management support in a driver in the Linux kernel, especially when it comes to idle management. A couple of frameworks are there to help and these comes with corresponding callback functions, that may be assigned on per device basis. The driver developer needs detailed knowledge about these frameworks, especially when the goal is to reach the best possible energy efficient behavior.

In this session, we look into the concepts for system wide suspend and the corresponding low power states, such as suspend to ram, suspend to idle and suspend to disk.

Additionally, for more fine grained power management per device, some best practices are explained of how to deploy support for runtime PM and PM domains (in particular the generic PM domain).

15:00 – 15:50 – SAN19-423 Git tricks by Viresh Kumar, Engineer, Linaro

Share Git tricks which can make working with Git more efficient

16:00 – 16:50 – SAN19-424 Event Tracing and Pstore with a pinch of Dynamic debug by Sai Prakash Ranjan, Kernel Engineer at Qualcomm

Event tracing is one of the powerful debug feature available in Linux Kernel as part of Ftrace. Pstore or Persistent Storage on the other hand is a boon to find the cause for the kernel’s dying breath as rightly said by someone and is widely used in production environments. When these two features are combined with a pinch of Dynamic debug, we form a full recipe for debugging problems in Linux Kernel.

This presentation talks about integrating event tracing with pstore to identify and root cause problems by analyzing the last few events before the Kernel says goodbye. In addition to this, we add dynamic debug support to filter out unwanted logs and limit trace to only specific files or directories which help in narrowing down problems to specific subsystems and currently is not supported by Ftrace.

Friday, September 27

8:30 – 8:55 – SAN19-501 WPEWebKit, the WebKit port for embedded platforms by Philippe Normand, Multimedia engineer and Partner at Igalia

WPEWebKit is a WebKit flavor (also known as port) specially crafted for
embedded platforms and use-cases. During this talk I would present WPEWebKit’s architecture with a special emphasis on its multimedia backend based on GStreamer and implementing support for the MSE (Media Source), EME (Encrypted Media), MediaCapabilities specifications. I would also present a case study on how to successfully integrate WPEWebKit on i.MX6 and i.MX8M platforms with the Cog standalone reference web-app container or within existing Qt5 applications, using the WPEQt QML plugin.

9:00 – 9:25 – SAN19-508 Community Driven Firmware Open Source Project by Jammy Zhou Solutions Director, Linaro

We have experienced very successful open source development for Linux operating system. But in the firmware area, most of the developments are carried out by some major organizations without the participation of community.

Application driven open source development enables a software system ecosystem which can adopt various hardware components, including different architectures, and let the vendors deferential and create tangible results. Open source firmware and Standard firmware interface is critical to enable different hardware implementation for the same ecosystem. In this secession we will discuss how to join and contribute to an open source firmware project managed by Linaro.

The key principles of this project are as following

- General System Firmware not specifically targeted for phone, client, server or cloud.
- Universal interface supporting multiple OSes including Linux and Windows
- Adaptive to various silicon especially silicone provided by member companies
- Encourage open technology and early standard implementation
- New TianoCore license model or similar
- Long Term Stable release instead of product driven

11:00 – 11:25 – SAN19-510 Is Chromium OS favorable for IOT Devices? by Khasim Syed Mohammed, Senior Android Engineer, Linaro Ltd

Chromium OS is one of the latest open source software distributions maintained by Google for almost a decade now has made its way into IOT segments.

In this session we look into the key components in Chromium OS that can help us overcome the software challenges in building next generation IOT devices.

- Chromium OS is initially designed for laptops, desktops and stand alone / all inone boxes, the well integrated software components like browsers, networking, security, boot options and window management can be leveraged as is for IOT devices. We go through few of these components to understand if they meet the IOT requirements.
- Chromium OS is built on Linux, thereby the OS has support for multiple latest SOCs and device drivers for various controllers. We should be discussing if chromium OS has picked up all the latest trends in Linux OS related to Power Management, security, upgraded device driver frameworks, etc.
- In general IOT devices are headless or less UI centric, we explore if chromium OS can be easily configured to boot on a headless unit ?
- Understanding the system requirements, memory requirements and power utilization are few key factors to consider if the OS fits the budget available for the end equipment.
- The new set of IOT devices are now pre-integrated with AI algorithms to help end user for better understanding of the surrounding or indecision making etc. The devices are also self learning with ML algorithms. Its important to know if Chromium OS has frameworks to download AI/ML algorithms or firmware at run time on DSPs or SOCs.

The topics covered in this session should help us quickly assess if chromium os is favorable for IOT devices or if we were bringing an elephant in the room.

11:30 – 11:55 – SAN19-514 Graal Compiler Optimizations on AArch64 by XGXiaohong Gong (Arm Technology China)

Graal is a dynamic compiler that integrates with the HotSpot JVM and converts Java bytecode to native machine code at runtime. It can be a replacement of the C2 compiler in Hotspot with the basic advantage that Graal is written in Java rather than C++, which makes it probably safety and easier to maintain and extend. Besides, Graal compiler has a focus on high performance so it’s also a big part of what makes Java as faster as it is.

Currently Graal has added many optimization mechanisms like speculative optimizations, inlining, partial escape analysis, lowering snippets, etc. Even so, compared to the C2 compiler, Graal still lose some optimizations and new features of OpenJDK especially for AArch64. Its performance could be better.

This presentation explores the status of Graal Optimization on AArch64, together with the performance data of some benchmarks. Some of the focus is on the recent changes and the improvement in AArch64 port which Arm contributes to. Some of the future works may also be introduced.

That will be all for my virtual schedule. There are also keynote sessions through the 5-day event, and Friday afternoon will be reserved for demos. You’ll find the full schedule here.

If you are not a member’s employee, you can still attend Linaro Connect. The one-day fee is $750, while a 5-day pass is $2,500, and you can register online.

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.