October 9, 2019 by Jean-Luc Aufranc (CNXSoft) - 14 Comments

Arm Custom Instructions Coming to Armv8-M Embedded Processors

So far Arm defined all instructions for their cores with the benefit of code portability between solutions, so code compiled for an Arm Cortex-M33 based microcontroller would run on another without modifications (we’re obviously talking about code running directly on the core, not using specific peripherals here).

But with RISC-V open-source architecture many have seen the benefit of custom instructions for specific tasks, at the risk of potential fragmentation. With Arm Techcon 2019 now taking place, Arm has just announced support for custom instructions for ARMv8-M embedded CPUs starting with Arm Cortex-M33 cores.

The implementation of Arm Custom Instructions for specific embedded and IoT applications will start in H1 2020 at no additional cost to licensees and without risk of software fragmentation using NOCP exception if the instructions are not available.

Arm futher explains:

Arm Custom Instructions are enabled by modifications to the CPU that reserve encoding space for designers to easily add custom datapath extensions while maintaining the integrity of the existing software ecosystem. This feature, together with the existing co-processor interface, enable Cortex-M33 CPUs to be extended with various types of accelerators optimized for edge compute use cases including machine learning (ML) and artificial intelligence (AI).

Specifically, Arm Custom Instructions for Armv8-M add a customizable module inside the processor which shares the same interface as the standard Arithmetic Logic Unit (ALU) of the CPU. There are multiple regions of the encoding space available for customization and you can choose up to eight regions based on the type of instructions you want to implement.

SoC designers will still have to follow classes of instruction extension for general-purpose and FPU/M-Profile Vector Extension (MVE). The announcement features quotes from STMicro, NXP, and Silicon Labs so one should probably expect new Arm Cortex-M33 MCU with custom instructions from those companies sometimes in 2020 or 2021.

Here’s an example of code (population count function) that could be optimized with custom instructions:

int popcount(uint32_t x) { 
  int n = 0; 
  for (int i = 0; i < 32; ++i) {
    n += (x >> i) & 1; 
  }
  return n;
}

int popcount(uint32_t x) {

int n = 0;

for (int i = 0; i < 32; ++i) {

n += (x >> i) & 1;

}

return n;

}

Hand-written, optimized assembly would look as follows:

MOV.W    r1, #0x55555555
AND.W    r1, r1, r0, LSR #1
SUBS     r0, r0, r1
MOV.W    r1, #0x33333333
AND.W    r1, r1, r0, LSR #2
BIC      r0, r0, #0xCCCCCCCC
ADD      r0, r1
MOV.W    r1, #0x01010101
ADD.W    r0, r0, r0, LSR #4
BIC      r0, r0, #0xF0F0F0F0
MULS     r0, r1, r0 
LSRS     r0, r0, #24

MOV.W r1, #0x55555555

AND.W r1, r1, r0, LSR #1

SUBS r0, r0, r1

MOV.W r1, #0x33333333

AND.W r1, r1, r0, LSR #2

BIC r0, r0, #0xCCCCCCCC

ADD r0, r1

MOV.W r1, #0x01010101

ADD.W r0, r0, r0, LSR #4

BIC r0, r0, #0xF0F0F0F0

MULS r0, r1, r0

LSRS r0, r0, #24

This code could be replaced by a single custom instruction that saves space, improves performance & efficiency executing in just one cycle:

CX1A p0, r0, #0 // population in r0, return r0

1	CX1A p0, r0, #0 // population in r0, return r0

More details, including a whitepaper, can be found on the product page.

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX Rockchip RK3588 mini-ITX motherboard

Name*

Email*

Website

I agree to the Privacy Policy

The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.

Name*

Email*

Website

I agree to the Privacy Policy

The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.

14 Comments

oldest

newest

blu

5 years ago

Makes sense. MCUs are the ideal medium for tinkering with instruction. Applications processors — less so.

dgp

5 years ago

Undocumented vendor specific extensions that only work with some hacked up tarball of GCC 4.8 here we come!

blu

5 years ago

dgp

I thought you were being enthusiastic about RV32 not long ago?..

dgp

5 years ago

blu

I’m against vendor specific extensions anywhere and yes they do already exist in the commercial RISC-V cores. The difference here is that RISC-V is a foundation not a single company and hopefully the extensions that survive will be those that become official RISC-V extensions. The real irony here is that ARM has such a presence in the MCU area because a common core that the silicon vendors couldn’t mess with (not with the license for cheap MCU parts anyway) did a lot to do away with the 500MB zipped IDE that only ran on Windows XP that people used to… Read more »

blu

5 years ago

dgp

> The difference here is that RISC-V is a foundation not a single company and hopefully the extensions that survive will be those that become official RISC-V extensions. I’m not seeing the difference though. Arm A-profile (not even M-profile) already has 3rd-party-originating extensions that made it into the core ISA. I’m not aware of such stuff in RV G (it inevitably will happen one day, just not yet). Actually, the fact it’s one company versus a foundation means faster turnaround — RV V is still pre-1.0, while arm licensees already ship SVE silicon. But we’re getting off topic. One of… Read more »

dgp

5 years ago

blu

>I’m not seeing the difference though. For the foundation to work there has to be cross vendor cooperation, patent sharing etc. If they don’t play ball they can’t put the RISC-V mark on their stuff. No RISC-V mark means you aren’t going to be showing up in my digikey parametric search. > 3rd-party-originating extensions that made it into the core ISA. Which means they are documented and supported by vendor-neutral tooling and can’t be used to lock you to a specific vendor’s version of ARM. Perfect. >Same with Xtensa, another successful MCU ISA. Xtensa, ARC etc are really only used… Read more »

blu

5 years ago

dgp

Clearly the core, arm-sanctioned ISA is not going anywhere. And you digikey parametric searches will forever be safe as long as they include ‘m0, m3, .. mN’ and not ‘Joe’s private extension 768’ — whether the parts you get hits of may *also* contain ‘JPE768’ should be largely irrelevant, as those extensions will be essentially dark silicon to you.

For the record, I also expect M-parts with custom extensions to be really only used in ASICs.

dgp

5 years ago

blu

>Clearly the core, arm-sanctioned ISA is not going anywhere.

That’s not the problem. The problem is silicon vendors getting control over part of the ISA, adding undocumented crap and then putting out binary middleware that uses it. One day they decide actually the crap they added was a bad idea, drop it and leave you stuck on an island with a single discontinued part that can run that piece of middleware. They already do this and really don’t need extra tools to make the situation even worse. This applies for RISC-V too.

blu

5 years ago

dgp

At the end of the day it’s their discretion — they may or may not have a very good reason ™ for wanting those extensions in their silicon (maybe it’s that extension that made their product a success in the first place?). In your turn, it’s your discretion whether you want to have anything to do with it or not. I mean, I do ISA extensions at my dayjob that I don’t feel happy about, but I still do them for being well compensated. It’s all a trade-off. Back on topic: ISAs do matter, and things have been moving in… Read more »

dgp

5 years ago

blu

>Back on topic: ISAs do matter,

As I said before much of the reason that all of the different funky MCU ISAs have gone by the wayside is because ARM offered a standardised cores, generic tools and so forth. This might be great business for ARM and silicon vendors but in the long run it means some poor guy desperately trying to find the special patched GCC tarball a few years down the line.

blu

5 years ago

dgp

In the long run it means devices with BOMs not burdened by performance margins imposed by ISA limitations, which limitations are usually solved either by clock margins (GHz MCUs, anybody?) or FPGAs — if you think gcc tarballs are bad, you should see some of the FPGA tools out there. That ‘poor guy desperately trying to find patched tools’ is usually well compensated for their efforts. That compensation comes from sales. Those sales come from — you’d never guess it — product competitiveness.

dgp

5 years ago

blu

>you should see some of the FPGA tools out there.

I have. For the older Xilinx stuff you need windows xp because it’s almost impossible to get ISE to run properly anywhere else.

>That ‘poor guy desperately trying to find patched tools’ is usually
>well compensated for their efforts.

I’m not sure why you keep feeling the need to talk to me like you’re teaching me something. Firmware I have written is out in the world running on well over a million cortex m? chips.

blu

5 years ago

dgp

> I’m not sure why you keep feeling the need to talk to me like you’re teaching me something. Firmware I have written is out in the world running on well over a million cortex m? chips. Good for you. I’m glad I haven’t taught you anything this time around. You fooled me with your first post, seemingly contradicting even your own prev stances, so I switch to basics yet again — mea culpa. So are you after all excited for RV32 or not? You do realize you’d have quite some gcc tarball hunting to do with many of its… Read more »

dgp

5 years ago

blu

>So are you after all excited for RV32 or not? Did I say I was excited for RISC-V because it had vendor extensions? I don’t think so. I remember talking about official optional extensions that have been implemented specifically so that you don’t need special GCC versions, can be trapped and handled in ROM code etc. FYI: I’m not actually excited about RISC-V because of the ISA. I think in a few years we will see access to old crappy fab processes open up like we have for PCBs and we might get to the point where almost anyone can… Read more »