How One Line of Code Tripled Allwinner A20 SATA Write Performance

If you’ve been following this blog long enough, you may remember that all linux-sunxi community work aiming at improving u-boot and Linux software support on Allwinner processors started with Allwinner A10 processor found in MeLE A1000 TV box back in 2012, which at the time provided an interesting alternative to Raspberry Pi board that was in short supply at launch time and several months after.

One of the most interesting feature found in Allwinner A10 single core Arm Cortex-A8 processor was its SATA interface, and Allwinner A20 was announced a few months later with a dual core Cortex-A7 processor and virtually the same peripherals as Allwinner A10, including SATA. However when I  tested CubieTruck board connected to a mechanical drive, I noticed sequential SATA performance was fine for reads (~180MB/s), but writes were fairly slow at around 36 MB/s.

Other people complained about it, and some looked into it, and at one point it appeared the maximum SATA write performance for Allwinner A10/A20 was 45MB/s either due to buggy silicon and driver problems.

Allwinner A20 SATA Performance PatchIt turns out it may just have been a driver problem as a recent patch changing one line of code enables write speeds up about three times faster (200% improvement).

Most of us are not familiar with Allwinner SATA DMA registers, but luckily the patch explains what’s going on here:

Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from default 0x0 each to 0x3 each gives a write performance boost of 120MB/s from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s [tested on SSD using dd bs=4K count=512K].

Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).

I tried to look into Allwinner A20 public documentation, but I could not find anything about P0DMACR or much details about SATA registers, as only the SATA clock appears to be documented. Maybe that explains why it took 7 years to fix this performance issue…

Igor of Armbian tested the patch on Cubietruck with the more reliable iozone benchmark, and the results look great:


A sequential write of 38875 KB/s with Linux 4.19 was vastly improved to 127084 KB/s by applying this one line patch. It’s great, and there does not seem to be side-effects so far. The patch looks fairly new, so more testing may be needed. If you are running Armbian and can wait a bit, you won’t need to apply the patch yourself since it is part of Debian & Ubuntu releases. Uenal Mutlu also submitted his patch to the Linux Kernel mailing list, so it should be part of Linux 5.2.

34
Leave a Reply

avatar
9 Comment threads
25 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
13 Comment authors
itchy n scratchytheguyukDiegoCantadgp Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
willy
Guest
willy

It’s great, we all remember how A20’s SATA performance was not impressive! The only sad thing is that nobody from AllWinner cared to have a look at this issue while these chips were still relevant several years ago, when everyone was bashing them for their performance.

Roger
Guest
Roger

The A20 is still relevant in many ways. It’s still being produced and used in new boards. It’s become one of the arm chips with the best mainline kernel support. And SATA support makes it stand out.

willy
Guest
willy

Sure, new boards are still being built. But considering how power-hungry this 40nm chip was for only 2×1008 MHz I suspect these boards definitely do not focus on performance nor efficiency, and might even continue to rely on the outdated 3.4.39 horrible kernel for legacy reasons, so they will not even benefit from this fix anyway (just like the thousands that were issued since).

tkaiser
Guest
tkaiser

It’s not only sequential write performance improving but also random IO benefitting a lot, see doubled numbers at 16K block size. What’s needed now are testers who

* use btrfs to run some really intensive HDD and SSD tests 24/7 (iozone in a loop for a example)
* test Uenal’s patch also with Allwinner R40/V40 (BPi M2 Ultra/Berry)
* test the patch on A10 devices

Using a ‘checksummed’ filesystem for reliability tests is important since even minor data corruption will be reported. And while this is a nice improvement for A10/A20 due to the limited CPU performance of these old Allwinner SoCs and the CPU becoming a bottleneck here pretty early the most interesting targets of this patch might be the quad-core successor unfortunately only used on Bananas.

Diego
Guest
Diego

How about using dm-integrity instead of btrfs? Might be slightly lighter on CPU?

tkaiser
Guest
tkaiser

No idea, never used dm-integrity. At least using btrfs for this use case is really easy. Attach a SATA device and get the device node (assuming /dev/sda) and then it’s just

Checking dmesg output for data corruption issues and a final btrfs scrub is mandatory of course.

Diego
Guest
Diego

I recently stumbled over this: https://gist.github.com/MawKKe/caa2bbf7edcc072129d73b61ae7815fb

and integritysetup when I was looking for useful alternatives for zfs and btrfs, it’s also what RH is suggesting.

tkaiser
Guest
tkaiser

That’s indeed interesting. I’ll do some tests when time permits comparing especially CPU utilization. But to be honest: one of the killer features of both ZFS and btrfs is snapshot handling (especially being able to send them to other disks/devices via send/receive commands). With LVM based snapshots at least in the past performance dropped significantly once there are a few snapshots. But maybe this changed too in the meantime?

Diego
Guest
Diego

I don’t know about any recent improvements, as I’m new into this. But I agree that send and receive is very interesting. I’m still planning to build some experimental nas but as usual time and to a certain extent also money…

roel
Guest
roel

Will this is also apply to the cubieboard A10? Or only A20 SOC’s?

tkaiser
Guest
tkaiser

Just built an Armbian/Buster image (4.19.38) to test on Olimex Lime (A10). Close to 100 MB/s with sequential reads and writes on a btrfs on a Samsung EVO750. CPU utilization when reading 100%, with writes it’s ~80% — the old and boring single core A8 has become a bottleneck with this use case.

Now starting with reliability / data integrity testing 24/7 for the next few days…

tkaiser
Guest
tkaiser

> Now starting with reliability / data integrity testing 24/7 for the next few days…

After 3 days of continuous stress testing a btrfs filesystem on A10 Lime no performance degradation and no data corruption issues. While I used different data access patterns (see https://irclog.whitequark.org/linux-sunxi/2019-05-13#24595330; for details) I’m not sure this covers all corner cases.

xnc-hardware
Guest
xnc-hardware

Please, please this should be “fixes: ” for the the stable 4.19 kernel (Debian 10 Buster).

Igor Pecovnik
Guest
Igor Pecovnik

It’s already fixed in Armbian Buster 🙂

tkaiser
Guest
tkaiser

He asked for a stable OS.

Igor Pecovnik
Guest
Igor Pecovnik

Oh. Then Armbian Stretch with 300+, mainly Allwinner related, patches/improvements over generic 4.19.y kernel … or Jessie, Xenial, Bionic and Disco user space with the same kernel.

David Willmore
Guest
David Willmore

Buster isn’t the ‘stable’ release of Debian, Stretch is.

zoobab
Guest

Mele A1000, I still have one taking dust! Good memories!

zoobab
Guest

“I tried to look into Allwinner A20 public documentation, but I could not find anything about P0DMACR or much details about SATA registers, as only the SATA clock appears to be documented. Maybe that explains why it took 7 years to fix this performance issue…”

When do we get a law to impose penalties on companies not documenting their chips properly?

FFII looked at this issue back in 1997, and the situation has not changed much.

As long as those manufacturers don’t have substancial fines for providing bad documentation, the debate is not gonna go anywhere.

dgp
Guest
dgp

Legislation will have the exact opposite effect: Instead of getting documentation* you just won’t have access to the bounty of cheap Chinese chips anymore.

*Assuming allwinner et al. actually have the documentation you want. They all buy IP blocks and hack up the provided reference code until it works. In a lot of cases they probably have no idea how it works either.

zoobab
Guest

And if devices with such undocumented chips cannot be sold on European market, what will they do? Ignore the European market?

willy
Guest
willy

Probably, just like some American web sites are now closed to Europeans thanks to the stupid GDPR.

dgp
Guest
dgp

>And if devices with such undocumented chips cannot be sold on European market, what will they do?

Find some loophole that allows some category of product to bypass the regulations, i.e. old products being grandfathered in, to keep selling or do what they’ve been doing all along and just ignore the regulations altogether.

>Ignore the European market?

That would be the ultimate outcome and the thing is it doesn’t really hurt them as they never gave a shit about regulations, the GPL etc. It hurts all of the companies in the EU that benefit from not having to pay NXP or TI premium prices like startups trying to make their way onto the market by remortgaging the founders homes. Bigger companies like Amazon and Google that ship undocumented 3.x kernel junk from Mediatek etc will just litigate their way around it or chalk the fines up to the cost of doing business.

Canta
Guest
Canta

A very usefull patch .
I have pcduino a20 and cubie truck a20.
It was useless with slow sata witting and i just put in my box for storage.
Thinking again to reuse for portable my nextcloud with cheap dramless ssd.

Going to try with buster debian ( and going to check centos 7 armv7 already has the patch or not)

roel
Guest
roel

I wonder where all this A10/A20 based boards are. Seems everybody threw them away. On the libreelec forum ther is also no reaction on the request for testing A10/A20 images.
On second hand sites You don’t find also, so if somebody (in europe) still has a BPI-R1/Lamobo R1 for sale for a reasonable price…

tkaiser
Guest
tkaiser

> I wonder where all this A10/A20 based boards are

Here they are: https://geizhals.de/?cat=mbarm&xf=3749_Allwinner%7E8195_1 — if the aforementioned patch also improves SATA performance on R40/V40 currently the most interesting board for such use cases might be the BPi M2 Berry (only downside: prone to underpowering thanks to crappy Micro USB for power).

tkaiser
Guest
tkaiser

> BPI-R1/Lamobo R1

Great example for a company ‘interacting’ with community: http://forum.banana-pi.org/t/new-hardware-revision-of-r1/4550 (Nora Lee is Banana Pi product manager at Foxconn, Lion Wang is SinoVoip CEO)

itchy n scratchy
Guest
itchy n scratchy

LOL