If you’ve been following this blog long enough, you may remember that all linux-sunxi community work aiming at improving u-boot and Linux software support on Allwinner processors started with Allwinner A10 processor found in MeLE A1000 TV box back in 2012, which at the time provided an interesting alternative to Raspberry Pi board that was in short supply at launch time and several months after.
One of the most interesting feature found in Allwinner A10 single core Arm Cortex-A8 processor was its SATA interface, and Allwinner A20 was announced a few months later with a dual core Cortex-A7 processor and virtually the same peripherals as Allwinner A10, including SATA. However when I tested CubieTruck board connected to a mechanical drive, I noticed sequential SATA performance was fine for reads (~180MB/s), but writes were fairly slow at around 36 MB/s.
Other people complained about it, and some looked into it, and at one point it appeared the maximum SATA write performance for Allwinner A10/A20 was 45MB/s either due to buggy silicon and driver problems.
Most of us are not familiar with Allwinner SATA DMA registers, but luckily the patch explains what’s going on here:
Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from default 0x0 each to 0x3 each gives a write performance boost of 120MB/s from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s [tested on SSD using dd bs=4K count=512K].
Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).
I tried to look into Allwinner A20 public documentation, but I could not find anything about P0DMACR or much details about SATA registers, as only the SATA clock appears to be documented. Maybe that explains why it took 7 years to fix this performance issue…
Igor of Armbian tested the patch on Cubietruck with the more reliable iozone benchmark, and the results look great:
Samsung SSD 840 Pro 256 GB @ Cubietruck
iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2
reclen write rewrite read reread read write
102400 4 10714 15285 31921 32280 16328 14767
102400 16 21757 25767 57812 58010 45695 25201
102400 512 33403 32429 128245 116062 109591 33595
102400 1024 34846 35240 129965 131121 129515 35227
102400 16384 37895 37918 207564 204627 204340 38019
Kernel 4.19.y with SATA improvement patch
102400 4 22876 32704 37686 39143 22571 30990
102400 16 54254 69325 94749 97225 61354 68529
102400 512 110670 113325 190346 163677 186012 112679
102400 1024 113971 115928 206044 207406 184936 115069
102400 16384 127084 127588 243400 253305 252148 127611
102400 4 18053 22336 45249 46338 24860 22292
102400 16 30692 32188 106052 106577 71526 32746
102400 512 39632 39978 186433 185444 178097 39939
102400 1024 39860 40163 189900 191076 188446 40098
102400 16384 38875 41508 241939 244088 243405 41314
A sequential write of 38875 KB/s with Linux 4.19 was vastly improved to 127084 KB/s by applying this one line patch. It’s great, and there does not seem to be side-effects so far. The patch looks fairly new, so more testing may be needed. If you are running Armbian
and can wait a bit, you won’t need to apply the patch yourself since it is part of Debian & Ubuntu releases. Uenal Mutlu also submitted his patch to the Linux Kernel mailing list, so it should be part of Linux 5.2.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
34 Replies to “How One Line of Code Tripled Allwinner A20 SATA Write Performance”
It’s great, we all remember how A20’s SATA performance was not impressive! The only sad thing is that nobody from AllWinner cared to have a look at this issue while these chips were still relevant several years ago, when everyone was bashing them for their performance.
The A20 is still relevant in many ways. It’s still being produced and used in new boards. It’s become one of the arm chips with the best mainline kernel support. And SATA support makes it stand out.
Sure, new boards are still being built. But considering how power-hungry this 40nm chip was for only 2×1008 MHz I suspect these boards definitely do not focus on performance nor efficiency, and might even continue to rely on the outdated 3.4.39 horrible kernel for legacy reasons, so they will not even benefit from this fix anyway (just like the thousands that were issued since).
It’s not only sequential write performance improving but also random IO benefitting a lot, see doubled numbers at 16K block size. What’s needed now are testers who
* use btrfs to run some really intensive HDD and SSD tests 24/7 (iozone in a loop for a example)
* test Uenal’s patch also with Allwinner R40/V40 (BPi M2 Ultra/Berry)
* test the patch on A10 devices
Using a ‘checksummed’ filesystem for reliability tests is important since even minor data corruption will be reported. And while this is a nice improvement for A10/A20 due to the limited CPU performance of these old Allwinner SoCs and the CPU becoming a bottleneck here pretty early the most interesting targets of this patch might be the quad-core successor unfortunately only used on Bananas.
How about using dm-integrity instead of btrfs? Might be slightly lighter on CPU?
No idea, never used dm-integrity. At least using btrfs for this use case is really easy. Attach a SATA device and get the device node (assuming /dev/sda) and then it’s just
dmesgoutput for data corruption issues and a final
btrfs scrubis mandatory of course.
I recently stumbled over this: https://gist.github.com/MawKKe/caa2bbf7edcc072129d73b61ae7815fb
and integritysetup when I was looking for useful alternatives for zfs and btrfs, it’s also what RH is suggesting.
That’s indeed interesting. I’ll do some tests when time permits comparing especially CPU utilization. But to be honest: one of the killer features of both ZFS and btrfs is snapshot handling (especially being able to send them to other disks/devices via send/receive commands). With LVM based snapshots at least in the past performance dropped significantly once there are a few snapshots. But maybe this changed too in the meantime?
I don’t know about any recent improvements, as I’m new into this. But I agree that send and receive is very interesting. I’m still planning to build some experimental nas but as usual time and to a certain extent also money…
Will this is also apply to the cubieboard A10? Or only A20 SOC’s?
Just built an Armbian/Buster image (4.19.38) to test on Olimex Lime (A10). Close to 100 MB/s with sequential reads and writes on a btrfs on a Samsung EVO750. CPU utilization when reading 100%, with writes it’s ~80% — the old and boring single core A8 has become a bottleneck with this use case.
Now starting with reliability / data integrity testing 24/7 for the next few days…
> Now starting with reliability / data integrity testing 24/7 for the next few days…
After 3 days of continuous stress testing a btrfs filesystem on A10 Lime no performance degradation and no data corruption issues. While I used different data access patterns (see https://irclog.whitequark.org/linux-sunxi/2019-05-13#24595330; for details) I’m not sure this covers all corner cases.
I can see more details about the patch and registers now.
Simply amazing. Uenal Mutlu (the developer) had to look into Texas Instruments documentation in order to understand/guess how Allwinner ‘s registers work.
Allwinner is just shooting itself in the foot by not releasing proper documentation, I wonder if they lost any sales during to their SATA performance issue.
They probably don’t care because they target STBs, tablets, and various embedded devices where this is not important enough. For sure, some advanced users know the devices will not match their expectations and will not buy them. But this may represent far less than 1% of their market and they don’t care at all. They could have at least looked at the cause for the low performance and fix it themselves to save their image, they did not even do that.
I would believe Allwinner in general didn’t took much care about anything that happened outside their traditional sales channels in the beginning. AW management even might have considered linux-sunxi as enemies in the past (the whole ‘GPL violations’ show) and as far as I know this changed just recently.
Now there’s even one AW employee dedicated to open source who sporadically contributes directly to linux-sunxi wiki with documentation and answers community’s questions. It would be great to establish a contact between Uenal and Wink to probably further improve SATA (maybe more board makers will then pick up R40/V40 or A40i to create inexpensive boards with native SATA now that performance is perfectly fine for spinning rust — but even then this is still ‘far less than 1% of their market’).
@Jean-Luc Improvement is actually already in all Armbian Debian/Ubuntu releases where kernel is 4.19.y not just in upcoming Debian 10.
Please, please this should be “fixes: ” for the the stable 4.19 kernel (Debian 10 Buster).
It’s already fixed in Armbian Buster 🙂
He asked for a stable OS.
Oh. Then Armbian Stretch with 300+, mainly Allwinner related, patches/improvements over generic 4.19.y kernel … or Jessie, Xenial, Bionic and Disco user space with the same kernel.
Buster isn’t the ‘stable’ release of Debian, Stretch is.
Mele A1000, I still have one taking dust! Good memories!
“I tried to look into Allwinner A20 public documentation, but I could not find anything about P0DMACR or much details about SATA registers, as only the SATA clock appears to be documented. Maybe that explains why it took 7 years to fix this performance issue…”
When do we get a law to impose penalties on companies not documenting their chips properly?
FFII looked at this issue back in 1997, and the situation has not changed much.
As long as those manufacturers don’t have substancial fines for providing bad documentation, the debate is not gonna go anywhere.
Legislation will have the exact opposite effect: Instead of getting documentation* you just won’t have access to the bounty of cheap Chinese chips anymore.
*Assuming allwinner et al. actually have the documentation you want. They all buy IP blocks and hack up the provided reference code until it works. In a lot of cases they probably have no idea how it works either.
And if devices with such undocumented chips cannot be sold on European market, what will they do? Ignore the European market?
Probably, just like some American web sites are now closed to Europeans thanks to the stupid GDPR.
>And if devices with such undocumented chips cannot be sold on European market, what will they do?
Find some loophole that allows some category of product to bypass the regulations, i.e. old products being grandfathered in, to keep selling or do what they’ve been doing all along and just ignore the regulations altogether.
>Ignore the European market?
That would be the ultimate outcome and the thing is it doesn’t really hurt them as they never gave a shit about regulations, the GPL etc. It hurts all of the companies in the EU that benefit from not having to pay NXP or TI premium prices like startups trying to make their way onto the market by remortgaging the founders homes. Bigger companies like Amazon and Google that ship undocumented 3.x kernel junk from Mediatek etc will just litigate their way around it or chalk the fines up to the cost of doing business.
A very usefull patch .
I have pcduino a20 and cubie truck a20.
It was useless with slow sata witting and i just put in my box for storage.
Thinking again to reuse for portable my nextcloud with cheap dramless ssd.
Going to try with buster debian ( and going to check centos 7 armv7 already has the patch or not)
I wonder where all this A10/A20 based boards are. Seems everybody threw them away. On the libreelec forum ther is also no reaction on the request for testing A10/A20 images.
On second hand sites You don’t find also, so if somebody (in europe) still has a BPI-R1/Lamobo R1 for sale for a reasonable price…
Olimex is still selling their OlinuXino boards, and they just wrote a blog post about their Allwinner boards to explain why they haven’t made new boards with other chips so far.
Lots of A20 boards on Amazon too, and on Aliexpress the A20 sells as a thin client, from several sources.
> I wonder where all this A10/A20 based boards are
Here they are: https://geizhals.de/?cat=mbarm&xf=3749_Allwinner%7E8195_1 — if the aforementioned patch also improves SATA performance on R40/V40 currently the most interesting board for such use cases might be the BPi M2 Berry (only downside: prone to underpowering thanks to crappy Micro USB for power).
> BPI-R1/Lamobo R1
Great example for a company ‘interacting’ with community: http://forum.banana-pi.org/t/new-hardware-revision-of-r1/4550 (Nora Lee is Banana Pi product manager at Foxconn, Lion Wang is SinoVoip CEO)