Posts Tagged ‘arm’

Arm Research Summit 2017 Streamed Live on September 11-13

September 11th, 2017 2 comments

The Arm Research Summit is “an academic summit to discuss future trends and disruptive technologies across all sectors of computing”, with the second edition of the even taking place now in Cambridge, UK until September 13, 2017.

Click to Enlarge

The Agenda includes various subjects such as architecture and memory, IoT, HPC, computer vision, machine learning, security, servers, biotechnology and others. You can find the full detailed schedule for each day on Arm website, and the good news is that the talks are streamed live in YouTube, so you can follow the talks that interest you from the comfort of your home/office.

Note that you can switch between rooms in the stream above by clicking on <-> icon. Audio volume is a little low…

Thanks to Nobe for the tip.

Linux 4.13 Release – Main Changes, ARM & MIPS Architectures

September 4th, 2017 6 comments

Linus Torvalds has just announced the release of Linux 4.13 and a kidney stone…:

So last week was actually somewhat eventful, but not enough to push me to delay 4.13.

Most of the changes since rc7 are actually networking fixes, the bulk of them to various drivers. With apologies to the authors of said patches, they don’t look all that interesting (which is definitely exactly what you want just before a release). Details in the appended shortlog.

Note that the shortlog below is obviously only since rc7 – the _full_4.13 log is much too big to post and nobody sane would read it. So if you’re interested in all the rest of it, get the git tree and limit the logs to the files you are interested in if you crave details.

No, the excitement was largely in the mmu notification layer, where we had a fairly last-minute regression and some discussion about the problem. Lots of kudos to Jérôme Glisse for jumping on it, and implementing the fix.

What’s nice to see is that the regression pointed out a nasty and not very well documented (or thought out) part of the mmu notifiers, and the fix not only fixed the problem, but did so by cleaning up and documenting what the right behavior should be, and furthermore did so by getting rid of the problematic notifier and actually removing almost two hundred lines in the process.

I love seeing those kinds of fixes. Better, smaller, code.

The other excitement this week was purely personal, consisting of seven hours of pure agony due to a kidney stone. I’m all good, but it sure _felt_ a lot longer than seven hours, and I don’t even want to imagine what it is for people that have had the experience drag out for longer. Ugh.

Anyway, on to actual 4.13 issues.

While we’ve had lots of changes all over (4.13 was not particularly big, but even a “solidly average” release is not exactly small), one very _small_ change merits some extra attention, because it’s one of those very rare changes where we change behavior due to security issues, and where people may need to be aware of that behavior change when upgrading.

This time it’s not really a kernel security issue, but a generic protocol security issue.

The change in question is simply changing the default cifs behavior: instead of defaulting to SMB 1.0 (which you really should not use: just google for “stop using SMB1” or similar), the default cifs mount now defaults to a rather more modern SMB 3.0.

Now, because you shouldn’t have been using SMB1 anyway, this shouldn’t affect anybody. But guess what? It almost certainly does affect some people, because they blithely continued using SMB1 without really thinking about it.

And you certainly _can_ continue to use SMB1, but due to the default change, now you need to be *aware* of it. You may need to add an explicit “vers=1.0” to your mount options in /etc/fstab or similar if you *really* want SMB1.

But if the new default of 3.0 doesn’t work (because you still use a pterodactyl as a windshield wiper), before you go all the way back to the bad old days and use that “vers=1.0”, you might want to try “vers=2.1”. Because let’s face it, SMB1 is just bad, bad, bad.

Anyway, most people won’t notice at all. And the ones that do notice can check their current situation (just look at the output of “mount” and see if you have any cifs things there), and you really should update from the default even if you are *not* upgrading kernels.

Ok, enough about that. It was literally a two-liner change top defaults – out of the million or so lines of the full 4.13 patch changing real code.

Go get the new kernel,


Two months ago, Linux 4.12 was released with initial support for AMD Radeon RX Vega GPU, BFQ (Budget Fair Queuing) and Kyber block I/O schedulers, AnalyzeBoot tool for the kernel, “hybrid consistency model” implementation for live kernel patching, but disabled the Open Sound System, and removed AVR32 support, among many other changes.

Some interesting changes in Linux 4.13 – mostly based on LWN 4.13 Merge Window part 1 & part 2 – include:

  • Support for non-blocking buffered I/O operations added at the block level, which should also improve asynchronous I/O support when used with buffered I/O.
  • AppArmor security module’s “domain labeling” code has been merged into the mainline. It was maintained by Ubuntu out of tree previously.
  • Kernel-based TLS implementation that should deliver better performance for HTTPS, and other protocol relying on TLS.
  • CIFS/SAMBA now defaults to v3.0 instead of v1.0 due to security issues
  • File System Changes – EXT-4: support for to ~2 billion files per directory with largedir option, extended attributes up to 64KB, new deduplication feature; f2fs: supports disk quotas; overlayfs union: new “index directory” feature that makes copy-up operations work without breaking hard links.

Changes specific to ARM include:

  • Rockchip:
    • Added support for RV1108 SoC for camera applications
    • Rockchip IOMMU driver is now available on ARM64
    • PCIe – configure Rockchip MPS and reorganize + use normal register bank
    • Clock driver for Rockchip RK3128 SoC
    • Rockchip pinctrl driver now supports iomux-route switching for RK3228, RK3328 and RK3399
    • Sound driver – Support for Rockchip PDM controllers
    • Device tree
      • Added RK3399-Firefly SBC
      • Added ARM Mali GPU
      • Added cru
      • Added sdmmc, sdio, emmc nodes for Rockchip RK3328
  • Amlogic
    • Updated CEC EE clock support
    • Enabled clock controller for 32-bit Meson8
    • Device tree changes
      • Meson UARTs
      • new SPI controller driver
      • HDMI & CVBS for multiple boards
      • new pinctrl pins for SPI, HDMI CEC, PWM
      • Ethernet Link and Activity LEDs pin nodes
      • SAR ADC support for Meson8 & Meson8b
    • Defconfig changes – Meson SPICC enabled as module; IR core, decoders and Meson IR device enabled;
    • New boards & devices: NanoPi K2, Libre Computer SBC, R-Box Pro
  • Samsung
    • Clock driver updated for Samsung Exynos 5420 audio clocks, and converted code to clk_hw registration APIs
    • Pinctrl drivers split per ARMv7 and ARMv8 since there’s no need to compile everything on each of them
    • ARM DT updates:
      • Add HDMI CEC to Exynos5 SoCs + needed property for CEC on Odroid U3
      • Fix reset GPIO polarity on Rinato
      • Minor cleanups and readability improvements.
    • ARM64 DT updates:
      • Remove unneeded TE interrupt gpio property
    • Defconfig changes – Some cleanups, enabled Exynos PRNG along with user-space crypto API.
  • Qualcomm
    • Clock & pinctrl drivers for Qualcomm IPQ8074
    • Add debug UART addresses for IPQ4019
    • Improve QCOM SMSM error handling
    • Defconfig
      • Enable HWSPINLOCK & RPMSG_QCOM_SMD to get some Qualcomm boards to work out of the box/again
      • Enable IPQ4019 clock and pinctrl
    • Mailbox – New controller driver for Qualcomm’s APCS IPC
    • RPMsg – Qualcomm GLINK protocol driver and DeviceTree-based modalias support, as well as a number of smaller fixes
    • Qualcomm Device Tree Changes
      • Fix IPQ4019 i2c0 node
      •  Add GSBI7 on IPQ8064
      • Add misc APQ8060 devices
      • Fixup USB related devices on APQ8064 and MSM8974
    • Qualcomm ARM64 Updates for v4.12
      • Fix APQ8016 SBC WLAN LED
      • Add MSM8996 CPU node
      • Add MSM8992 SMEM and fixed regulator
      • Fixup MSM8916 USB support
  • Mediatek
    • CPU clks for Mediatek MT8173/MT2701/MT7623 SoCs
    • Pinctrl – Serious code size cut for MT7623
    • Mediatek “scpsys” system controller support for MT6797
    • Device tree
      • Added support for MT6797 (Helio X20) mobile SoC and evaluation board
      • Extended MT7623 support significantly
      • Added MT2701 i2c device & JPEG decoder nodes
  • Other new ARM hardware platforms and SoCs:
    • STM32 – stm32h743-disco, stm32f746-disco, and stm32f769-disco boards; Drivers for digital audio interfaces, S/PDIF receiver, digital camera interfaces, HDMI CEC, watchdog timer
    • NXP – Gateworks Ventana GW5600 SBC;  Technexion Pico i.MX7D board; i.MX5/6 image processing units & camera sensor interfaces
    • Realtek – Initial support for Realtek RTD1295 SoC and Zidoo X9S set-top-box
    • Actions Semi – Initial support for Actions Semi S900 / S500, and corresponding LeMaker Guitar & Bubblegum-96 SBCs
    • Renesas – Salvator-XS and H3ULCB automotive development systems; GR-Peach board, iWave G20D-Q7 System-on-Module plus
    • Socionext- Support for Uniphier board support for LD11-global and LD20-global
    • Broadcom – Stingray communication processor and two reference boards;
    • Marvell – Linksys WRT3200ACM router
    • Texas Instruments – BeagleBone Blue
    • Microchip / Atmel – MMU-less ARM Cortex-M7 SoCs (SAME70/V71/S70/V70)

Some of the changes specific to MIPS include:

  • Boston platform support – Document DT bindings; Add CLK driver for board clocks
  • CM – Avoid per-core locking with CM3 & higher; WARN on attempt to lock invalid VP, not BUG
  • CPS – Select CONFIG_SYS_SUPPORTS_SCHED_SMT for MIPSr6; Prevent multi-core with dcache aliasing; Handle cores not powering down more gracefully; Handle spurious VP starts more gracefully
  • DSP – Add lwx & lhx missaligned access support
  • eBPF – Add MIPS support along with many supporting change to add the required infrastructure
  • Generic arch code:
    • Misc sysmips MIPS_ATOMIC_SET fixes
    • Negate error syscall return in trace
    • Correct forced syscall errors
    • Traced negative syscalls should return -ENOSYS
    • Allow samples/bpf/tracex5 to access syscall arguments for sane
    • Cleanup from old Kconfig options in defconfigs
    • Fix PREF instruction usage by memcpy for MIPS R6
    • Fix various special cases in the FPU eulation
    • Fix some special cases in MIPS16e2 support
    • Fix MIPS I ISA /proc/cpuinfo reporting
    • Sort MIPS Kconfig alphabetically
    • Fix minimum alignment requirement of IRQ stack as required by ABI / GCC
    • Fix special cases in the module loader
    • Perform post-DMA cache flushes on systems with MAARs
    • Probe the I6500 CPU
    • Cleanup cmpxchg and add support for 1 and 2 byte operations
    • Use queued read/write locks (qrwlock)
    • Use queued spinlocks (qspinlock)
    • Add CPU shared FTLB feature detection
    • Handle tlbex-tlbp race condition
    • Allow storing pgd in C0_CONTEXT for MIPSr6
    • Use current_cpu_type() in m4kc_tlbp_war()
    • Support Boston in the generic kernel
  • Generic platform:
    • yamon-dt: Pull YAMON DT shim code out of SEAD-3 board;  Support > 256MB of RAM;  Use serial* rather than uart* aliases
    • Abstract FDT fixup application
    • Set RTC_ALWAYS_BCD to 0
    • Add a MAINTAINERS entry
  • core kernel – qspinlock.c: include linux/prefetch.h
  • Add support for Loongson 3
  • Perf – Add I6500 support
  • SEAD-3 – Remove GIC timer from DT; set interrupt-parent per-device, not at root node; fix GIC interrupt specifiers
  • SMP – Skip IPI setup if we only have a single CPU
  • VDSO – Make comment match reality; improvements to time code in VDSO”
  • Various fixes:
    • compressed boot: Ignore a generated .c file
    • VDSO: Fix a register clobber list
    • DECstation: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression
    • Octeon: Fix recent cleanups that cleaned away a bit too much thus breaking the arch side of the EDAC and USB drivers.
    • uasm: Fix duplicate const in “const struct foo const bar[]” which GCC 7.1 no longer accepts.
    • Fix race on setting and getting cpu_online_mask
    • Fix preemption issue. To do so cleanly introduce macro to get the size of L3 cache line.
    • Revert include cleanup that sometimes results in build error
    • MicroMIPS uses bit 0 of the PC to indicate microMIPS mode. Make sure this bit is set for kernel entry as well.
    • Prevent configuring the kernel for both microMIPS and MT. There are no such CPUs currently and thus the combination is unsupported and results in build errors.
    • ralink: mt7620: Add missing header

You can read the full Linux 4.13 changelog – with comments only – generated using git log v4.12..v4.13 --stat for the full details, and eventually kernelnewsbies’s Linux 4.13 changelog will be updated with an extensive list of chances.

The Stress Terminal UI (s-tui) is a Pretty CPU and Temperature Monitoring Terminal App

August 30th, 2017 12 comments

While it’s possible to do monitoring with tools like RPI-Monitor on headless or remote systems, top and htop are likely the commonly used tools to monitor CPU and process usage in the terminal. There’s now a new and different option with the Stress Terminal UI that display pretty charts for frequency, CPU usage, and temperature in the terminal, and as its name implies it can also stress the system.

I’ve first installed it in my main computer running Ubuntu 16.04.2 as follows:

and then just started it

It took the screenshot above after enabling stress operation for a few seconds, and while frequency and CPU utilization in percent are updated properly, temperature is not, at least on my system. I had to enable “Smooth Graph” option to see any changes in the first two charts. I tried to run the app again with sudo, but still no temperature update, and that’s because the program is confused since I have two “temp1” values in my machine, one for “it8720” that will always show a constant temperature (45 °C) and one for “AMD CPU” that will vary depending on the load.

Click to Enlarge

A good solution would have to have an option to select the sensor, but there’s no such option for s-tui right now:

I switched to an ARM platform, namely NanoPi NEO board:

So it does not work with Python 3, and requires Python 2.7. Let’s give it another try:

I could install it and run it, but only CPU and temperature charts would be drawn.

The temperature sensor may have not been detected due to the following error:

msr appears to be be something called “model-specific registers”, and only used for x86 CPU, so ARM detection did not work or was not performed here. The developers did test the program successfully with a Raspberry Pi 3 board however. He’s also aware of issues with temperature sensors:

If the temperature does not show up, your sensor might not be supported. Try opening an issue on github and we can look into it

So overall, it’s still work in progress. Support for showing info for multiple cores would be nice, but probably only suitable in a full screen terminal.

Via It’s FOSS

Categories: Linux, Testing, Ubuntu Tags: arm, how-to, Linux, tool, ubuntu, x86

Embedded Linux Conference & Open Source Summit Europe 2017 Schedule

August 27th, 2017 3 comments

The Embedded Linux Conference & IoT summit 2017 took place in the US earlier this year in February, but there will soon be a similar event with the Embedded Linux Conference *& Open Source Summit Europe 2017 to take up in Europe on October 23 – 25 in Prague, Czech Republic, and the Linux Foundation has just published the schedule. It’s always useful to find out what is being discussed during such events, even if you are not going to attend, so I went through the different sessions, and compose my own virtual schedule with some of the ones I find the most interesting.

Monday, October 23

  • 11:15 – 11:55 – An Introduction to SPI-NOR Subsystem – Vignesh Raghavendra, Texas Instruments India

Modern day embedded systems have dedicated SPI controllers to support NOR flashes. They have many hardware level features to increase the ease and efficiency of accessing SPI NOR flashes and also support different SPI bus widths and speeds.

In order to support such advanced SPI NOR controllers, SPI-NOR framework was introduced under Memory Technology Devices (MTD). This presentation aims at providing an overview of SPI-NOR framework, different types of NOR flashes supported (like SPI/QSPI/OSPI) and interaction with SPI framework. It also provides an overview of how to write a new controller driver or add support for a new flash device.

The presentation then covers generic improvements done and proposed while working on improving QSPI performance on a TI SoC, challenges associated when using DMA with these controllers and other limitations of the framework.

  • 12:05 – 12:45 – Free and Open Source Software Tools for Making Open Source Hardware – Leon Anavi, Konsulko Group

The open source hardware movement is becoming more and more popular. But is it worth making open source hardware if it has been designed with expensive proprietary software? In this presentation, Leon Anavi will share his experience how to use free and open source software for making high-quality entirely open source devices: from the designing the PCB with KiCAD through making a case with OpenSCAD or FreeCAD to slicing with Cura and 3D printing. The talk will also provide information about open source hardware licenses, getting started guidelines, tips for avoiding common pitfalls and mistakes. The challenges of prototyping and low-volume manufacturing with both SMT and THT will be also discussed.

  • 14:20 – 15:00 – Introduction to SoC+FPGA – Marek Vašut, DENX Software Engineering GmbH

In this talk, Marek introduces the increasingly popular single-chip SoC+FPGA solutions. At the beginning, the diverse chip offerings from multiple vendors are introduced, ranging from the smallest IoT-grade solutions all the way to large industrial-level chips with focus on their software support. Mainline U-Boot and Linux support for such chips is quite complete, and already deployed in production. Marek demonstrates how to load and operate the FPGA part in both U-Boot and Linux, which recently gained FPGA manager support. Yet to fully leverage the potential of the FPGA manager in combination with Device Tree (DT) Overlays, patches are still needed. Marek explains how the FPGA manager and the DT Overlays work, how they fit together and how to use them to obtain a great experience on SoC+FPGA, while pointing out various pitfalls.

  • 15:10 – 15:50 – Cheap Complex Cameras – Pavel Machek, DENX Software Engineering GmbH

Cameras in phones are different from webcams: their main purpose is to take high-resolution still pictures. Running preview in high resolution is not feasible, so resolution switch is needed just before taking final picture. There are currently no applications for still photography that work with mainline kernel. (Pavel is working on… two, but both have some limitations). libv4l2 is doing internal processing in 8-bit, which is not enough for digital photography. Cell phones have 10 to 12-bit sensors, some DSLRs do 14-bit depth.

Differences do not end here. Cell phone camera can produce reasonable picture, but it needs complex software support. Auto-exposure / auto-gain is a must for producing anything but completely black or completely white frames. Users expect auto-focus, and it is necessary for reasonable pictures in macro range, requiring real-time processing.

  • 16:20 – 17:00 – Bluetooth Mesh with Zephyr OS and Linux – Johan Hedberg, Open Source Technology Center, Intel

Bluetooth Mesh is a new standard that opens a whole new wave of low-power wireless use cases. It extends the range of communication from a single peer-to-peer connection to a true mesh topology covering large areas, such as an entire building. This paves the way for both home and industrial automation applications. Typical home scenarios include things like controlling the lights in your apartment or adjusting the thermostat. Although Bluetooth 5 was released end of last year, Bluetooth Mesh can be implemented on any device supporting Bluetooth 4.0 or later. This means that we’ll likely see very rapid market adoption of the feature.

The presentation will give an introduction to Bluetooth Mesh, covering how it works and what kind of features it provides. The talk will also give an overview of Bluetooth Mesh support in Zephyr OS and Linux and how to create wireless solutions with them.

  • 17:10 – 17:50 – printk() – The Most Useful Tool is Now Showing its Age – Steven Rostedt, VMware

printk() has been the tool for debugging the Linux kernel and for being the display mechanism for Linux as long as Linux has been around. It’s the first thing one sees as the life of the kernel begins, from the kernel banner and the last message at shutdown. It’s critical as people take pictures of a kernel oops to send to the kernel developers to fix a bug, or to display on social media when that oops happens on the monitor on the back of an airplane seat in front of you.

But printk() is not a trivial utility. It serves many functionalities and some of them can be conflicting. Today with Linux running on machines with hundreds of CPUs, printk() can actually be the cause of live locks. This talk will discuss all the issues that printk() has today, and some of the possible solutions that may be discussed at Kernel Summit.

  • 18:00 – 18:45 – BoF: Embedded Linux Size – Michael Opdenacker, Free Electrons

This “Birds of a Feather” session will start by a quick update on available resources and recent efforts to reduce the size of the Linux kernel and the filesystem it uses.

An ARM based system running the mainline kernel with about 3 MB of RAM will also be demonstrated. If you are interested in the size topic, please join this BoF and share your experience, the resources you have found and your ideas for further size reduction techniques!

Tuesday, October 24

  • 10:55 – 11:35 – Introducing the “Lab in a Box” Concept – Patrick Titiano & Kevin Hilman, BayLibre

Continuous Integration (CI) has been a hot topic for long time. With the growing number of architectures and boards, it becomes impossible for maintainers to validate a patch on all configurations, making it harder and harder to keep the same quality level without leveraging CI and test automation. Recent initiatives like LAVA,, Fuego, (…) started providing a first answer, however the learning curve remains high, and the HW setup part is not covered.

Baylibre, already involved in, decided, as part of the AGL project, to go one step further in CI automation and has developed a turnkey solution for developers and companies willing to instantiate a LAVA lab; called “Lab in a Box”, it aims at simplifying the configuration of a board farm (HW, SW).

Motivations, challenges, benefits and results will be discussed, with a demo of a first “Lab in a Box” instantiation.

  • 11:45 – 12:25 – Protecting Your System from the Scum of the Universe – Gilad Ben-Yossef, Arm Holdings

Linux based systems have a plethora of security related mechanisms: DM-Crypt, DM-Verity, Secure Boot, the new TEE sub-system, FScrypt and IMA are just a few examples. This talk will describe these the various systems and provide a practical walk through of how to mix and match these mechanisms and design them into a Linux based embedded system in order to strengthen the system resilience to various nefarious attacks, whether the system discussed is a mobile phone, a tablet, a network attached DVR, a router, or an IOT hub in a way that makes maximum use of the sometime limited hardware resources of such systems.

  • 14:05 – 14:45 – Open Source Neuroimaging: Developing a State-of-the-Art Brain Scanner with Linux and FPGAs – Danny Abukalam, Codethink

Neuroimaging is an established medical field which is helping us to learn more about how the human brain works, the most complex human organ. This talk aims to cover neuroimaging systems, from hobbyist to professional, and how open source has been used to build state-of-the-art systems. We’ll have a look the general problem area, why open source was a good fit, and some examples of solutions including a commercial effort that we have been involved in bringing to market. Typically these solutions consist of specialist hardware, a bespoke software solutions stack, and a suite to manage and process the vast amounts of data generated during the scan. Other points of interest include how we approached building a maintainable and upgradeable system from the outset. We’ll also talk about future plans for neuroimaging, future ideas for hardware & discuss areas lacking good open source solutions.

  • 14:55 – 15:35 – More Robust I2C Designs with a New Fault-Injection Driver – Wolfram Sang, Renesas

It has its challenges to write code for certain error paths for I2C bus drivers because these errors usually don’t happen on the bus. And special I2C bus testers are expensive. In this talk, a new GPIO based driver will be presented which acts on the same bus as the bus master driver under inspection. A live demonstration will be given as well as hints how to handle bugs which might have been found. The scope and limitations of this driver will be discussed. Since it will also be analyzed what actually happens on the wires, this talk also serves as a case study how to snoop busses with only Free Software and OpenHardware (i.e. sigrok).

  • 16:05 – 16:45 – GStreamer for Tiny Devices – Olivier Crête, Collabora

GStreamer is a complete Open Source multimedia framework, and it includes hundreds of plugins, including modern formats like DASH, HLS or the first ever RTSP 2.0 implementation. The whole framework is almost 150MB on my computer, but what if you only have 5 megs of flash available? Is it a viable choice? Yes it is, and I will show you how.

Starting with simple tricks like only including the necessary plugins, all the way to statically compiling only the functions that are actually used to produce the smaller possible footprint.

  • 16:55 – 17:35 – Maintaining a Linux Kernel for 13 Years? You Must be Kidding Me. We Need at Least 30? – Agustin Benito Bethencourt, Codethink Ltd

Industrial grade solutions have a life expectancy of 30+ years. Maintaining a Linux kernel for such a long time in the open has not been done. Many claim that is not sustainable, but corporations that build power plants, railway systems, etc. are willing to tackle this challenge. This talk will describe the work done so far on the kernel maintenance and testing front at the CIP initiative.

During the talk it will be explained how we decide which parts of the kernel to cover – reducing the amount of work to be done and the risk of being unable to maintain the claimed support. The process of reviewing and backporting fixes that might be needed on an older branch will be briefly described. CIP is taking a different approach from many other projects when it comes to testing the kernel. The talk will go over it as well as the coming steps. and the future steps.

Wednesday, October 24

  • 11:05 – 11:45 – HDMI 4k Video: Lessons Learned – Hans Verkuil, Cisco Systems Norway

So you want to support HDMI 4k (3840×2160) video output and/or video capture for your new product? Then this is the presentation for you! I will describe the challenges involved in 4k video from the hardware level, the HDMI protocol level and up to the kernel driver level. Special attention will be given to what to watch out for when buying 4k capable equipment and accessories such as cables and adapters since it is a Wild, Wild West out there.

  • 11:55 – 12:35 – Linux Powered Autonomous Arctic Buoys – Satish Chetty, Hera Systems 

In my talk/presentation, I cover the technical, and design challenges in developing an autonomous Linux powered Arctic buoy. This system is a low cost, COTS based, extreme/harsh environment, autonomous sensor data gathering platform. It measures albedo, weather, water temperature and other parameters. It runs on a custom embedded Linux and is optimized for efficient use of solar & battery power. It uses a variety of low cost, high accuracy/precision sensors and satellite/terrestrial wireless communications.

I talk about using Linux in this embedded environment, and how I address and solve various issues including building a custom kernel, Linux drivers, frame grabbing issues and results from cameras, limited power challenges, clock drifts due to low temperature, summer melt challenges, failure of sensors, intermittent communication issues and various other h/w & s/w challenges.

  • 14:15 – 14:55 – Linux Storage System Bottleneck for eMMC/UFS – Bean Huo & Zoltan Szubbocsev, Micron

The storage device is considered a bottleneck to the system I/O performance. This thinking drives the need for faster storage device interfaces. Commonly used flash based storage interfaces support high throughputs, eg. eMMC 400MB/s, UFS 1GB/s. Traditionally, advanced embedded systems were focusing on CPU and memory speeds and these outpaced advances in storage speed improvements. In this presentation, we explore the parameters that impact I/O performance. We describe at a high level how Linux manages I/O requests coming from user space. Specifically, we look into system performance limitations in the Linux eMMC/UFS subsystem and expose bottlenecks caused by the software through Ftrace. We show existing challenges in getting maximum performance of flash-based high-speed storage device. by this presentation, we want to motivate future optimization work on the existing storage stack.

  • 15:05 – 15:45 – New GPIO Interface for User Space – Bartosz Golaszewski

Since Linux 4.8 the GPIO sysfs interface is deprecated. Due to its many drawbacks and bad design decisions a new user space interface has been implemented in the form of the GPIO character device which is now the preferred method of interaction with GPIOs which can’t otherwise be serviced by a kernel driver. The character device brings in many new interesting features such as: polling for line events, finding GPIO chips and lines by name, changing & reading the values of multiple lines with a single ioctl (one context switch) and many more. In this presentation, Bartosz will showcase the new features of the GPIO UAPI, discuss the current state of libgpiod (user space tools for using the character device) and tell you why it’s beneficial to switch to the new interface.

  • 16:15 – 16:55 – Replace Your Exploit-Ridden Firmware with Linux – Ronald Minnich, Google

With the WikiLeaks release of the vault7 material, the security of the UEFI (Unified Extensible Firmware Interface) firmware used in most PCs and laptops is once again a concern. UEFI is a proprietary and closed-source operating system, with a codebase almost as large as the Linux kernel, that runs when the system is powered on and continues to run after it boots the OS (hence its designation as a “Ring -2 hypervisor”). It is a great place to hide exploits since it never stops running, and these exploits are undetectable by kernels and programs.

Our answer to this is NERF (Non-Extensible Reduced Firmware), an open source software system developed at Google to replace almost all of UEFI firmware with a tiny Linux kernel and initramfs. The initramfs file system contains an init and command line utilities from the u-root project, which are written in the Go language.

  • 17:05 – 17:45 – Unikernelized Real Time Linux & IoT – Tiejun Chen, Vmware

Unikernel is a novel software technology that links an application with OS in the form of a library and packages them into a specialized image that facilitates direct deployment on a hypervisor. But why these existing unikernels have yet to gain large popularity broadly? I’ll talk what challenges Unikernels are facing, and discuss exploration of if-how we could convert Linux as Unikernel, and IoT could be a valuable one of use cases because the feature of smaller size & footprint are good for those resource-strained IoT platforms. Those existing unikernels are not designed to address those IoT characters like power consumption and real time requirement, and they also doesn’t support versatile architectures. Most existing Unikernels just focus on X86/ARM. As a paravirtualized unikenelized Linux, especially Unikernelized Real Time Linux, really makes Unikernels to succeed.

If you’d like to attend the real thing, you’ll need to register and pay a registration fee:

  • Early Registration Fee: US$800 (through August 27, 2017)
  • Standard Registration Fee: US$950 (August 28, 2017 – September 17, 2017)
  • Late Registration Fee: US$1100 (September 18, 2017 – Event)
  • Academic Registration Fee: US$200 (Student/Faculty attendees will be required to show a valid student/faculty ID at registration.)
  • Hobbyist Registration Fee: US$200 (only if you are paying for yourself to attend this event and are currently active in the community)

There’s also another option with the Hall Pass Registration ($150) if you just want to network on visit with sponsors onsite, but do not plan to attend any sessions or keynotes.

Fedora 26 Supports Single “Unified” OS Images for Multiple ARM Platforms

August 14th, 2017 29 comments

The decision to use device tree in Linux occurred several years ago, after Linus Torvalds complained that Linux on ARM was a mess, with the ultimate goal of providing a unified ARM kernel for all hardware. Most machine specific board files in arch/arm/mach-xxx/ are now gone from the Linux kernel, being replaced by device tree files, and in many case you simply need to replace the DTB (Device Tree Binary) file from an operating system to run on different hardware platforms. However, this is not always that easy as U-boot still often differ between boards / devices, so it’s quite frequent to distribute different firmware / OS images per board. Fedora has taken another approach, as the developers are instead distributing a single Fedora 26 OS ARMv7 image, together with an installation script.

Images for 64-bit ARM (Aarch64) are a little different since they are designed for SBSA compliant servers, so a single image will work on any server leveraging UEFI / ACPI implementation on the hardware. So what follows is specific to ARMv7 hard-float images as explained in the Wiki.

You’ll need to install Fedora Arm installer after downloading one of the Fedora 26 images. This requires an Fedora machine, and since I’m running Ubuntu 16.04, and don’t want to setup a Fedora virtual machine in Virtualbox, I used docker instead right inside Ubuntu as it’s much faster to do:

The last line requires some explanation. /media/hdd is the mount point of the storage device on the host where I download the Fedora image and that will be accessible through /mnt in docker, /dev/sdd is my micro SD card device, while /dev/sdd3 will be the rootfs partition.Note that it took me a while to get that right, and I’m not sure it works for all targets (other other /dev/sddx are also needed), so using an actual Fedora 26 installation would be easier. The rest of the instructions below are not specific to docker.

I could then install the Fedora ARM Installer and the required xz & file packages…

…and check the usage:

Let’s see how many boards are supported in /usr/share/doc/fedora-arm-installer/SUPPORTED-BOARDS file:

AllWinner Devices:
A10-OLinuXino-Lime A10s-OLinuXino-M A13-OLinuXino A13-OLinuXinoM
A20-OLinuXino-Lime A20-OLinuXino-Lime2 A20-OLinuXino_MICRO
A20-Olimex-SOM-EVB Ampe_A76 Auxtek-T003 Auxtek-T004 Bananapi Bananapro CHIP
CSQ_CS908 Chuwi_V7_CW0825 Colombus Cubieboard Cubieboard2 Cubietruck
Cubietruck_plus Hummingbird_A31 Hyundai_A7HD Itead_Ibox_A20 Lamobo_R1
Linksprite_pcDuino Linksprite_pcDuino3 Linksprite_pcDuino3_Nano MK808C
MSI_Primo73 MSI_Primo81 Marsboard_A10 Mele_A1000 Mele_A1000G_quad Mele_I7
Mele_M3 Mele_M5 Mele_M9 Mini-X Orangepi Orangepi_mini Sinlinx_SinA31s
UTOO_P66 Wexler_TAB7200 Wits_Pro_A20_DKT Yones_Toptech_BS1078_V2 ba10_tv_box
colorfly_e708_q1 difrnce_dit4350 dserve_dsrv9703c i12-tvbox iNet_86VS
icnova-a20-swac inet86dz jesurun_q5 mk802 mk802_a10s mk802ii orangepi_2
orangepi_lite orangepi_pc orangepi_plus polaroid_mid2809pxe04
pov_protab2_ips9 q8_a13_tablet q8_a23_tablet_800x480 q8_a33_tablet_1024x600
q8_a33_tablet_800x480 r7-tv-dongle sunxi_Gemei_G9

MX6 Devices:
cm_fx6 mx6cuboxi novena riotboard wandboard

OMAP Devices:
am335x_boneblack am57xx_evm kc1 omap3_beagle omap4_panda omap5_uevm

MVEBU Devices:

ST Devices:

Other Devices:
jetson-tk1 rpi2 rpi3 trimslice

So we’ve got a list of device to choose from. For example, if you wanted to install Fedora 26 server in a micro SD card for Raspberry Pi 3, you’d run something like:

You’ll then be ask to confirm:

The full process will take several minutes, and at the end you’ll get “_/” rootfs partition, “_/boot ” partition, and a “30 MB volume” with u-boot, config,etc…

I did not try the micro SD card in Raspberry Pi 3 board myself, because Geek Till It Hertz has already done it successfully on both RPi 3 and Banana Pi boards as shown in the video below.

He also showed the boards run Linux 4.11.8 version, but that can be upgraded with dnf update to Linux 4.11.11, just as on his Fedora 26 installation on a x86-64  computer.

Movidius Neural Compute Stick Shown to Boost Deep Learning Performance by about 3 Times on Raspberry Pi 3 Board

August 9th, 2017 14 comments

Intel recently launched Movidius Neural Compute Stick (MvNCS)for low power USB based deep learning applications such as object recognition, and after some initial confusions, we could confirm the Neural stick could also be used on ARM based platforms such as the Raspberry Pi 3. Kochi Nakamura, who wrote the code for GPU accelerated object recognition on the Raspberry Pi 3 board, got hold of one sample in order to compare the performance between GPU and MvNCS acceleration.

His first attempt was quite confusing as with GoogLeNet, Raspberry Pi 3 + MvNCS achieved an average inference time of about 560ms, against 320 ms while using VideoCore IV GPU in RPi3 board. But then it was discovered that the “” demo would only use one core out of the 12 VLIW 128-bit vector SHAVE processors in Intel’s Movidius Myriad 2 VPU, and after enabling all those 12 cores instead of just one, performance increased to around 108 ms average time per inference. That’s almost 3 times faster compare to using the GPU in RPi3 for this specific demo, and it may vary for other demos / applications.

That’s the description in YouTube:

Comparison of deep learning inference acceleration by Movidius’ Neural Compute Stick (MvNCS) and by Idein’s software which uses Raspberry Pi’s GPU (VideoCore IV) without any extra computing resources.

Movidius’ demo runs GoogLeNet with 16-bit floating point precision.Average inference time is 108ms.
We used MvNC SDK 1.07.07 and their official demo script without any changes. (ncapi/py_examples/stream_infer/
It seems something is wrong with the inference results.
We recompiled graph file with -s12 option to use 12 SHAVE vector processor simultaneously.

Idein’s demo also runs GoogLeNet with 32-bit floating point precision. Average inference time is 320ms.

It’s interesting to note the GPU demo used 32-bit floating point precision, against 16-bit floating point precision on the Neural Compute Stick, although it’s unclear to me how that may affect performance of such algorithms. Intel recommends a USB 3.0 interface for MvNCS, and the Raspberry Pi 3 only comes with a USB 2.0 interface that shares the bandwidth for the USB webcam and the MvNCS, so it’s possible an ARM board with a USB 3.0 interface for the stick, and a separate USB interface for the webcam could perform better. Has anybody tested it? A USB 3.0 interface and hub would also allow to cascade several Neural Compute Sticks.

How ARM Nerfed NEON Permute Instructions in ARMv8

August 7th, 2017 33 comments

This is a guest post by blu about an issue he found with a specific instruction in ARMv8 NEON. He previously wrote an article about OpenGL ES development on Ubuntu Touch, and one or two other posts.

This is not a happy-ending story. But as with most unhappy-ending stories, this is a story with certain moral for the reader. So read on if you appreciate a good moral.

Once upon a time there was a very well-devised SIMD instruction set. Its name was NEON, or formally — ARM Advanced SIMD — ASIMD for short (most people still called it NEON). It was so nice, that veteran coders versed in multiple SIMD ISAs often wished other SIMD ISAs were more like NEON.

NEON had originated as part of the larger ARM ISA version 7, or ARMv7, for short. After much success in the mobile and embedded domains, ARMv7 was superseded by what experts acknowledged as the next step in the evolution of modern ISAs – ARMv8. It was so good that it was praised by compiler writers as possibly the best ISA they could wish for. As part of all the enhancements in the new ISA, NEON too got its fair share of improvements – and so ASIMD2 superseded NEON (ARMv8’s SIMD ISA is called ASIMD2, but some call it NEON2).

Now, one of the many things the original NEON got right was the permute capabilities. Contrary to other ISAs whose architects kept releasing head-banging permute ops one after another, the architects of NEON got permutes right from the start. They did so by providing a compact-yet-powerful set of permutation ops, the most versatile of which, by far, being the tbl op and its sister op tbx; each of those provided a means to compose a SIMD vector from all the thinkable combinations of the individual byte-lanes of up to 4 source SIMD vectors. Neat. The closest thing on AMD64 is pshufb from SSSE3, but it takes a single vector as input (and the AVX2 256-bit vpshufb is borked even further).

Not only NEON had those ops on an architectural level, but the actual implementations – the different μarchitectures that embodied NEON, did so quite efficiently. Second and third generation performant ARMv7 Cortex CPUs could issue up to two tbl ops per clock and return the results as soon as 3 clocks later.

So, with this fairy tale to jump-start our story, let’s teleport ourselves to present-day reality.

I was writing down an ingenious algorithm last week, one that was meant to filter elements from an input stream. Naturally, the algorithm relied heavily on integer SIMD vectors for maximum efficiency, and it happened so that I was writing the initial version on ARM64, with plans for later translation to AMD64. Now, as part of that algorithm, a vector-wise horizontal sort had to be carried – something which is best left to a sorting network (See Ken Batcher’s Sorting Network algorithms). Sorting networks are characterized by doing a fixed number of steps to sort their input vector, and at each of those steps a good amount of permutations occur. As I was sorting a 16-lane vector (a rather wide one), its sorting network was a 10-deep one, and while some of the stages required trivial permutations, others called for the most versatile permutes of them all – the mighty tbl op. So I decided that for an initial implementation I’d use tbl throughput the sorting network.

As I was writing the algorithm away from home, I was using my trusty Ubuntu tablet (Cortex-A53, ARM64) as a workstation (yes, with a keyboard). I had a benchmark for a prima-vista version up and running off L1 cache, showing the algo performing in line with my work-per-clock expectations. It wasn’t until early on the following week that I was able to finally test it on my Cortex-A72 ARM64 workhorse desktop. And there things turned bizarre.

To my stupefaction, on the A72 the bench performed nothing like on the A53. It was effectively twice slower, both in absolute times as well as in per-clock performance (tablet is 1.5GHz, desktop is 2.0GHz but I kept it at 1.3GHz when doing nothing taxing). I checked and double-checked that the compiler had not done anything stupid – it hadn’t – disassembled code was exactly as expected, and yet, there was the ‘big’ A72, 3-decode, 8-dispatch, potent-OoO design getting owned by a ‘little’ tablet’s (or a toaster’s – A53s are so omnipresent these days) in-order, 2-decode design. Luckily for me, my ARM64 desktop is perf-clad (perf is the linux profiler used by kernel developers), so seconds later I was staring at perf reports.

There was no room for guessing – there were some huge, nay, massive stalls clumped around the permute ops. The algo was spending the bulk of its time in stalling on those permutes. Those beautiful, convenient tbl permutes – part of the reason I went to prototype the algo on ARM64 in the first place. The immediate take was that A72 tbl op performed nothing like the A53 tbl op. Time to dust up the manual, buddy. What I saw in the A72 (and A57) optimization manual had me scratch my head more than I could’ve expected.

First off, in 32-bit mode (A32) tbl op performs as I’d expect it to, and as it appears to still do on the A53 in A64 mode (64-bit):

op throughput, ops/clock latency, clocks
tbl from 1 source,  64-bit-wide 2 3
tbl from 2 sources, 64-bit-wide 2 3
tbl from 3 sources, 64-bit-wide 2 6
tbl from 4 sources, 64-bit-wide 2 6

But in 64-bit mode (A64), that transforms into:

op throughput, ops/clock latency, clocks
tbl from 1 source,  64-bit-wide 2 3 * 1 = 3
tbl from 2 sources, 64-bit-wide 2 3 * 2 = 6
tbl from 3 sources, 64-bit-wide 2 3 * 3 = 9
tbl from 4 sources, 64-bit-wide 2 3 * 4 = 12
tbl from 1 source,  128-bit-wide 2 3 * 1 + 3 = 6
tbl from 2 sources, 128-bit-wide 2 3 * 2 + 3 = 9
tbl from 3 sources, 128-bit-wide 2 3 * 3 + 3 = 12
tbl from 4 sources, 128-bit-wide 2 3 * 4 + 3 = 15

That’s right – 64-bit-wide tbl is severely penalized in A64 mode on A72 vs A32 mode. In my case, I was using the 128-bit-wide versions of the op, with 2 source arguments. So on the A72 I ended up getting (snippet of relevant code timeline):

= 12 clocks of latency for the snippet

But on the A53 same snippet yielded:

= 6 clocks of latency for the snippet

As the performance of the entire algorithm was dominated by the network sort, and the entirety of the network sort was comprised of repetitions of the above snippet, all observations fell into place — A53 was indeed twice faster (per-clock) than A72/A57 on this code, by design! So much for my elegant algorithm. Now I’d need to increase the data window so much as to be able to amortize the massive pipeline bubbles with more non-dependent work. Anything less would penalize the ‘big’ ARMv8 designs.

But that’s not what gets me in this entire story – I have no issue rewriting prototype or any other code. What does put me into contemplative mood is that code written for optimal work on A53’s pipeline could choke its ‘big’ brothers A57 & A72, and code written for optimal utilization of the pipelines of those CPUs could not necessarily be the most efficient code on the A53. All it takes is some tbl permutes. That is only exacerbated by big.LITTLE setups. That begs the question what were ARM thinking when they were designing A64 mode tbl on the ‘big’ cores.

Categories: Programming Tags: AArch64, arm, neon

Mediatek MT2625 NB-IoT SoC is Designed for Cellular IoT Devices working Worldwide

August 4th, 2017 1 comment

Mediatek has recently unveiled MT2625 SoC based on an ARM Cortex-M core, equipped with an NB-IoT “WorldMode” modem allowing for a single design worldwide, and supporting the latest 3GPP Release 14 (LTE Cat NB2) specification.

Mediatek MT2625 specifications:

  • CPU – ARM Cortex-M @ up to 104 MHz with FPU
  • Embedded Memory – 4MB PSRAM
  • Storage – 4MB NOR Flash
  • Connectivity
    • NB-IoT compatible with 3GPP Release 14
    • Full frequency band (450MHz to 2.1GHz) of 3GPP R13 (NB1) and R14 (NB2) standards
    • Integrated baseband, RF, and modem DSP
  • Peripherals – I2C,  I2S,  PCM,  SDIO,  UART
  • Power Supply – Integrated PMU

The solution will be found in products for worldwide transportation, municipal use, and consumer products, with a much longer battery life compared to existing devices relying on other 2G/3G/4G standards.

According to the press release, one of the first module based on MT2625 has been designed in collaboration with China Mobile, integrates the company’s eSIM card, and supports OneNET IoT open platform. You won’t find many details on Mediatek MT2625’s product page, but you could contact the company there, if you plan to design and deploy such modules in large quantities.