How One Line of Code Tripled Allwinner A20 SATA Write Performance

If you’ve been following this blog long enough, you may remember that all linux-sunxi community work aiming at improving u-boot and Linux software support on Allwinner processors started with Allwinner A10 processor found in MeLE A1000 TV box back in 2012, which at the time provided an interesting alternative to Raspberry Pi board that was in short supply at launch time and several months after.

One of the most interesting feature found in Allwinner A10 single core Arm Cortex-A8 processor was its SATA interface, and Allwinner A20 was announced a few months later with a dual core Cortex-A7 processor and virtually the same peripherals as Allwinner A10, including SATA. However when I  tested CubieTruck board connected to a mechanical drive, I noticed sequential SATA performance was fine for reads (~180MB/s), but writes were fairly slow at around 36 MB/s.

Other people complained about it, and some looked into it, and at one point it appeared the maximum SATA write performance for Allwinner A10/A20 was 45MB/s either due to buggy silicon and driver problems.

Allwinner A20 SATA Performance PatchIt turns out it may just have been a driver problem as a recent patch changing one line of code enables write speeds up about three times faster (200% improvement).

Most of us are not familiar with Allwinner SATA DMA registers, but luckily the patch explains what’s going on here:

Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from default 0x0 each to 0x3 each gives a write performance boost of 120MB/s from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s [tested on SSD using dd bs=4K count=512K].

Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).

I tried to look into Allwinner A20 public documentation, but I could not find anything about P0DMACR or much details about SATA registers, as only the SATA clock appears to be documented. Maybe that explains why it took 7 years to fix this performance issue…

Igor of Armbian tested the patch on Cubietruck with the more reliable iozone benchmark, and the results look great:

A sequential write of 38875 KB/s with Linux 4.19 was vastly improved to 127084 KB/s by applying this one line patch. It’s great, and there does not seem to be side-effects so far. The patch looks fairly new, so more testing may be needed. If you are running Armbian and can wait a bit, you won’t need to apply the patch yourself since it is part of Debian & Ubuntu releases. Uenal Mutlu also submitted his patch to the Linux Kernel mailing list, so it should be part of Linux 5.2.

Share this:
FacebookTwitterHacker NewsSlashdotRedditLinkedInPinterestFlipboardMeWeLineEmailShare

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX RK3588 mini-ITX motherboard

34 Replies to “How One Line of Code Tripled Allwinner A20 SATA Write Performance”

  1. It’s great, we all remember how A20’s SATA performance was not impressive! The only sad thing is that nobody from AllWinner cared to have a look at this issue while these chips were still relevant several years ago, when everyone was bashing them for their performance.

    1. The A20 is still relevant in many ways. It’s still being produced and used in new boards. It’s become one of the arm chips with the best mainline kernel support. And SATA support makes it stand out.

      1. Sure, new boards are still being built. But considering how power-hungry this 40nm chip was for only 2×1008 MHz I suspect these boards definitely do not focus on performance nor efficiency, and might even continue to rely on the outdated 3.4.39 horrible kernel for legacy reasons, so they will not even benefit from this fix anyway (just like the thousands that were issued since).

  2. It’s not only sequential write performance improving but also random IO benefitting a lot, see doubled numbers at 16K block size. What’s needed now are testers who

    * use btrfs to run some really intensive HDD and SSD tests 24/7 (iozone in a loop for a example)
    * test Uenal’s patch also with Allwinner R40/V40 (BPi M2 Ultra/Berry)
    * test the patch on A10 devices

    Using a ‘checksummed’ filesystem for reliability tests is important since even minor data corruption will be reported. And while this is a nice improvement for A10/A20 due to the limited CPU performance of these old Allwinner SoCs and the CPU becoming a bottleneck here pretty early the most interesting targets of this patch might be the quad-core successor unfortunately only used on Bananas.

      1. No idea, never used dm-integrity. At least using btrfs for this use case is really easy. Attach a SATA device and get the device node (assuming /dev/sda) and then it’s just

        Checking dmesg output for data corruption issues and a final btrfs scrub is mandatory of course.

          1. That’s indeed interesting. I’ll do some tests when time permits comparing especially CPU utilization. But to be honest: one of the killer features of both ZFS and btrfs is snapshot handling (especially being able to send them to other disks/devices via send/receive commands). With LVM based snapshots at least in the past performance dropped significantly once there are a few snapshots. But maybe this changed too in the meantime?

          2. I don’t know about any recent improvements, as I’m new into this. But I agree that send and receive is very interesting. I’m still planning to build some experimental nas but as usual time and to a certain extent also money…

    1. Just built an Armbian/Buster image (4.19.38) to test on Olimex Lime (A10). Close to 100 MB/s with sequential reads and writes on a btrfs on a Samsung EVO750. CPU utilization when reading 100%, with writes it’s ~80% — the old and boring single core A8 has become a bottleneck with this use case.

      Now starting with reliability / data integrity testing 24/7 for the next few days…

  3. I can see more details about the patch and registers now.
    Simply amazing. Uenal Mutlu (the developer) had to look into Texas Instruments documentation in order to understand/guess how Allwinner ‘s registers work.

    Allwinner is just shooting itself in the foot by not releasing proper documentation, I wonder if they lost any sales during to their SATA performance issue.

    1. They probably don’t care because they target STBs, tablets, and various embedded devices where this is not important enough. For sure, some advanced users know the devices will not match their expectations and will not buy them. But this may represent far less than 1% of their market and they don’t care at all. They could have at least looked at the cause for the low performance and fix it themselves to save their image, they did not even do that.

      1. I would believe Allwinner in general didn’t took much care about anything that happened outside their traditional sales channels in the beginning. AW management even might have considered linux-sunxi as enemies in the past (the whole ‘GPL violations’ show) and as far as I know this changed just recently.

        Now there’s even one AW employee dedicated to open source who sporadically contributes directly to linux-sunxi wiki with documentation and answers community’s questions. It would be great to establish a contact between Uenal and Wink to probably further improve SATA (maybe more board makers will then pick up R40/V40 or A40i to create inexpensive boards with native SATA now that performance is perfectly fine for spinning rust — but even then this is still ‘far less than 1% of their market’).

    2. @Jean-Luc Improvement is actually already in all Armbian Debian/Ubuntu releases where kernel is 4.19.y not just in upcoming Debian 10.

  4. Please, please this should be “fixes: ” for the the stable 4.19 kernel (Debian 10 Buster).

        1. Oh. Then Armbian Stretch with 300+, mainly Allwinner related, patches/improvements over generic 4.19.y kernel … or Jessie, Xenial, Bionic and Disco user space with the same kernel.

  5. “I tried to look into Allwinner A20 public documentation, but I could not find anything about P0DMACR or much details about SATA registers, as only the SATA clock appears to be documented. Maybe that explains why it took 7 years to fix this performance issue…”

    When do we get a law to impose penalties on companies not documenting their chips properly?

    FFII looked at this issue back in 1997, and the situation has not changed much.

    As long as those manufacturers don’t have substancial fines for providing bad documentation, the debate is not gonna go anywhere.

    1. Legislation will have the exact opposite effect: Instead of getting documentation* you just won’t have access to the bounty of cheap Chinese chips anymore.

      *Assuming allwinner et al. actually have the documentation you want. They all buy IP blocks and hack up the provided reference code until it works. In a lot of cases they probably have no idea how it works either.

      1. And if devices with such undocumented chips cannot be sold on European market, what will they do? Ignore the European market?

        1. Probably, just like some American web sites are now closed to Europeans thanks to the stupid GDPR.

        2. >And if devices with such undocumented chips cannot be sold on European market, what will they do?

          Find some loophole that allows some category of product to bypass the regulations, i.e. old products being grandfathered in, to keep selling or do what they’ve been doing all along and just ignore the regulations altogether.

          >Ignore the European market?

          That would be the ultimate outcome and the thing is it doesn’t really hurt them as they never gave a shit about regulations, the GPL etc. It hurts all of the companies in the EU that benefit from not having to pay NXP or TI premium prices like startups trying to make their way onto the market by remortgaging the founders homes. Bigger companies like Amazon and Google that ship undocumented 3.x kernel junk from Mediatek etc will just litigate their way around it or chalk the fines up to the cost of doing business.

  6. A very usefull patch .
    I have pcduino a20 and cubie truck a20.
    It was useless with slow sata witting and i just put in my box for storage.
    Thinking again to reuse for portable my nextcloud with cheap dramless ssd.

    Going to try with buster debian ( and going to check centos 7 armv7 already has the patch or not)

  7. I wonder where all this A10/A20 based boards are. Seems everybody threw them away. On the libreelec forum ther is also no reaction on the request for testing A10/A20 images.
    On second hand sites You don’t find also, so if somebody (in europe) still has a BPI-R1/Lamobo R1 for sale for a reasonable price…

    1. Olimex is still selling their OlinuXino boards, and they just wrote a blog post about their Allwinner boards to explain why they haven’t made new boards with other chips so far.

      1. Lots of A20 boards on Amazon too, and on Aliexpress the A20 sells as a thin client, from several sources.

Leave a Reply

Your email address will not be published. Required fields are marked *

Khadas VIM4 SBC
Khadas VIM4 SBC