How One Line of Code Tripled Allwinner A20 SATA Write Performance

If you’ve been following this blog long enough, you may remember that all linux-sunxi community work aiming at improving u-boot and Linux software support on Allwinner processors started with Allwinner A10 processor found in MeLE A1000 TV box back in 2012, which at the time provided an interesting alternative to Raspberry Pi board that was in short supply at launch time and several months after.

One of the most interesting feature found in Allwinner A10 single core Arm Cortex-A8 processor was its SATA interface, and Allwinner A20 was announced a few months later with a dual core Cortex-A7 processor and virtually the same peripherals as Allwinner A10, including SATA. However when I  tested CubieTruck board connected to a mechanical drive, I noticed sequential SATA performance was fine for reads (~180MB/s), but writes were fairly slow at around 36 MB/s.

Other people complained about it, and some looked into it, and at one point it appeared the maximum SATA write performance for Allwinner A10/A20 was 45MB/s either due to buggy silicon and driver problems.

Allwinner A20 SATA Performance PatchIt turns out it may just have been a driver problem as a recent patch changing one line of code enables write speeds up about three times faster (200% improvement).

Most of us are not familiar with Allwinner SATA DMA registers, but luckily the patch explains what’s going on here:

Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from default 0x0 each to 0x3 each gives a write performance boost of 120MB/s from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s [tested on SSD using dd bs=4K count=512K].

Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).

I tried to look into Allwinner A20 public documentation, but I could not find anything about P0DMACR or much details about SATA registers, as only the SATA clock appears to be documented. Maybe that explains why it took 7 years to fix this performance issue…

Igor of Armbian tested the patch on Cubietruck with the more reliable iozone benchmark, and the results look great:


A sequential write of 38875 KB/s with Linux 4.19 was vastly improved to 127084 KB/s by applying this one line patch. It’s great, and there does not seem to be side-effects so far. The patch looks fairly new, so more testing may be needed. If you are running Armbian and can wait a bit, you won’t need to apply the patch yourself since it is part of Debian & Ubuntu releases. Uenal Mutlu also submitted his patch to the Linux Kernel mailing list, so it should be part of Linux 5.2.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK Pi 4C Plus
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
34 Comments
oldest
newest
willy
willy
4 years ago

It’s great, we all remember how A20’s SATA performance was not impressive! The only sad thing is that nobody from AllWinner cared to have a look at this issue while these chips were still relevant several years ago, when everyone was bashing them for their performance.

Roger
Roger
4 years ago

The A20 is still relevant in many ways. It’s still being produced and used in new boards. It’s become one of the arm chips with the best mainline kernel support. And SATA support makes it stand out.

willy
willy
4 years ago

Sure, new boards are still being built. But considering how power-hungry this 40nm chip was for only 2×1008 MHz I suspect these boards definitely do not focus on performance nor efficiency, and might even continue to rely on the outdated 3.4.39 horrible kernel for legacy reasons, so they will not even benefit from this fix anyway (just like the thousands that were issued since).

tkaiser
tkaiser
4 years ago

It’s not only sequential write performance improving but also random IO benefitting a lot, see doubled numbers at 16K block size. What’s needed now are testers who * use btrfs to run some really intensive HDD and SSD tests 24/7 (iozone in a loop for a example) * test Uenal’s patch also with Allwinner R40/V40 (BPi M2 Ultra/Berry) * test the patch on A10 devices Using a ‘checksummed’ filesystem for reliability tests is important since even minor data corruption will be reported. And while this is a nice improvement for A10/A20 due to the limited CPU performance of these old… Read more »

Diego
Diego
4 years ago

How about using dm-integrity instead of btrfs? Might be slightly lighter on CPU?

tkaiser
tkaiser
4 years ago

No idea, never used dm-integrity. At least using btrfs for this use case is really easy. Attach a SATA device and get the device node (assuming /dev/sda) and then it’s just

Checking dmesg output for data corruption issues and a final btrfs scrub is mandatory of course.

Diego
Diego
4 years ago

I recently stumbled over this: https://gist.github.com/MawKKe/caa2bbf7edcc072129d73b61ae7815fb

and integritysetup when I was looking for useful alternatives for zfs and btrfs, it’s also what RH is suggesting.

tkaiser
tkaiser
4 years ago

That’s indeed interesting. I’ll do some tests when time permits comparing especially CPU utilization. But to be honest: one of the killer features of both ZFS and btrfs is snapshot handling (especially being able to send them to other disks/devices via send/receive commands). With LVM based snapshots at least in the past performance dropped significantly once there are a few snapshots. But maybe this changed too in the meantime?

Diego
Diego
4 years ago

I don’t know about any recent improvements, as I’m new into this. But I agree that send and receive is very interesting. I’m still planning to build some experimental nas but as usual time and to a certain extent also money…

roel
roel
4 years ago

Will this is also apply to the cubieboard A10? Or only A20 SOC’s?

tkaiser
tkaiser
4 years ago

Just built an Armbian/Buster image (4.19.38) to test on Olimex Lime (A10). Close to 100 MB/s with sequential reads and writes on a btrfs on a Samsung EVO750. CPU utilization when reading 100%, with writes it’s ~80% — the old and boring single core A8 has become a bottleneck with this use case.

Now starting with reliability / data integrity testing 24/7 for the next few days…

tkaiser
tkaiser
4 years ago

> Now starting with reliability / data integrity testing 24/7 for the next few days…

After 3 days of continuous stress testing a btrfs filesystem on A10 Lime no performance degradation and no data corruption issues. While I used different data access patterns (see https://irclog.whitequark.org/linux-sunxi/2019-05-13#24595330; for details) I’m not sure this covers all corner cases.

willy
willy
4 years ago

They probably don’t care because they target STBs, tablets, and various embedded devices where this is not important enough. For sure, some advanced users know the devices will not match their expectations and will not buy them. But this may represent far less than 1% of their market and they don’t care at all. They could have at least looked at the cause for the low performance and fix it themselves to save their image, they did not even do that.

tkaiser
tkaiser
4 years ago

I would believe Allwinner in general didn’t took much care about anything that happened outside their traditional sales channels in the beginning. AW management even might have considered linux-sunxi as enemies in the past (the whole ‘GPL violations’ show) and as far as I know this changed just recently. Now there’s even one AW employee dedicated to open source who sporadically contributes directly to linux-sunxi wiki with documentation and answers community’s questions. It would be great to establish a contact between Uenal and Wink to probably further improve SATA (maybe more board makers will then pick up R40/V40 or A40i… Read more »

Igor Pecovnik
4 years ago

@Jean-Luc Improvement is actually already in all Armbian Debian/Ubuntu releases where kernel is 4.19.y not just in upcoming Debian 10.

xnc-hardware
xnc-hardware
4 years ago

Please, please this should be “fixes: ” for the the stable 4.19 kernel (Debian 10 Buster).

Igor Pecovnik
4 years ago

It’s already fixed in Armbian Buster 🙂

tkaiser
tkaiser
4 years ago

He asked for a stable OS.

Igor Pecovnik
4 years ago

Oh. Then Armbian Stretch with 300+, mainly Allwinner related, patches/improvements over generic 4.19.y kernel … or Jessie, Xenial, Bionic and Disco user space with the same kernel.

David Willmore
David Willmore
4 years ago

Buster isn’t the ‘stable’ release of Debian, Stretch is.

zoobab
4 years ago

Mele A1000, I still have one taking dust! Good memories!

zoobab
4 years ago

“I tried to look into Allwinner A20 public documentation, but I could not find anything about P0DMACR or much details about SATA registers, as only the SATA clock appears to be documented. Maybe that explains why it took 7 years to fix this performance issue…”

When do we get a law to impose penalties on companies not documenting their chips properly?

FFII looked at this issue back in 1997, and the situation has not changed much.

As long as those manufacturers don’t have substancial fines for providing bad documentation, the debate is not gonna go anywhere.

dgp
dgp
4 years ago

Legislation will have the exact opposite effect: Instead of getting documentation* you just won’t have access to the bounty of cheap Chinese chips anymore.

*Assuming allwinner et al. actually have the documentation you want. They all buy IP blocks and hack up the provided reference code until it works. In a lot of cases they probably have no idea how it works either.

zoobab
4 years ago

And if devices with such undocumented chips cannot be sold on European market, what will they do? Ignore the European market?

willy
willy
4 years ago

Probably, just like some American web sites are now closed to Europeans thanks to the stupid GDPR.

dgp
dgp
4 years ago

>And if devices with such undocumented chips cannot be sold on European market, what will they do? Find some loophole that allows some category of product to bypass the regulations, i.e. old products being grandfathered in, to keep selling or do what they’ve been doing all along and just ignore the regulations altogether. >Ignore the European market? That would be the ultimate outcome and the thing is it doesn’t really hurt them as they never gave a shit about regulations, the GPL etc. It hurts all of the companies in the EU that benefit from not having to pay NXP… Read more »

Canta
Canta
4 years ago

A very usefull patch .
I have pcduino a20 and cubie truck a20.
It was useless with slow sata witting and i just put in my box for storage.
Thinking again to reuse for portable my nextcloud with cheap dramless ssd.

Going to try with buster debian ( and going to check centos 7 armv7 already has the patch or not)

roel
roel
4 years ago

I wonder where all this A10/A20 based boards are. Seems everybody threw them away. On the libreelec forum ther is also no reaction on the request for testing A10/A20 images.
On second hand sites You don’t find also, so if somebody (in europe) still has a BPI-R1/Lamobo R1 for sale for a reasonable price…

theguyuk
theguyuk
4 years ago

Lots of A20 boards on Amazon too, and on Aliexpress the A20 sells as a thin client, from several sources.

tkaiser
tkaiser
4 years ago

> I wonder where all this A10/A20 based boards are

Here they are: https://geizhals.de/?cat=mbarm&xf=3749_Allwinner%7E8195_1 — if the aforementioned patch also improves SATA performance on R40/V40 currently the most interesting board for such use cases might be the BPi M2 Berry (only downside: prone to underpowering thanks to crappy Micro USB for power).

tkaiser
tkaiser
4 years ago

> BPI-R1/Lamobo R1

Great example for a company ‘interacting’ with community: http://forum.banana-pi.org/t/new-hardware-revision-of-r1/4550 (Nora Lee is Banana Pi product manager at Foxconn, Lion Wang is SinoVoip CEO)

itchy n scratchy
itchy n scratchy
4 years ago

LOL

Khadas VIM4 SBC