Facebook Zstandard “zstd” & “pzstd” Data Compression Tools Deliver High Performance & Efficiency

Ubuntu 16.04 and – I assume – other recent operating systems are still using single-thread version of file & data compression utilities such as bzip2 or gzip by default, but I’ve recently learned that compatible multi-threaded compression tools such as lbzip2, pigz or pixz have been around for a while, and you can replace the default tools by them for much faster compression and decompression on multi-core systems. This post led to further discussion about Facebook’s Zstandard 1.0 promising both smaller and faster data compression speed. The implementation is open source, released under a BSD license, and offers both zstd single threaded tool, and pzstd multi-threaded tool. So we all started to do own little tests and were impressed by the results. Some concerns were raised about patents, and development is still work-in-progess with a few bugs here and there including pzstd segfaulting on ARM.

Zstd vs Zlib Compression Ratio vs Speed
Zstd vs Zlib Compression Ratio vs Speed

Zlib has 9 levels of compression, while Zstd has 19, so Facebook has tested all compression levels and their speed, and drawn the chart above comparing compression speed to compression ratio for all test points, and Zstd is clearly superior to zlib here.

They’ve also compared compression and decompression performance and aspect ratio for various other competing fast algorithms using lzbench to perform this from memory to prevent I/O bottleneck from storage devices.

NameRatioC.speedD.speed
MB/sMB/s
zstd 1.0.0 -12.877330940
zlib 1.2.8 -12.73095360
brotli 0.4 -02.708320375
QuickLZ 1.52.237510605
LZO 2.092.106610870
LZ4 r1312.1016203100
Snappy 1.1.32.0914801600
LZF 3.62.077375790

Again everything is a comprise, but Zstd is faster than algorithms with similar compression ratio, and has a higher compression ratio than faster algorithm.

But let’s not just trust Facebook, and instead try ourselves. The latest release is version 1.1.2, so that’s what I tried in my Ubuntu 16.04 machine:


This will install the latest stable release of zstd to your system, but the multi-thread is not build by default:


There are quite a lot of options for zstd:


Since we are going to compare results to other, I’ll also flush the file cache before each compression and decompression using:


I’ll use the default settings to compress Linux mainline directory stored in a hard drive with tar + zstd (single thread):

and pzstd (multiple threads):


Bear in mind that some time is lost due to I/O on the hard drive, but I wanted to test a real use case here, and if you want to specifically compare the raw performance of compressor you should use lzbench. Now let’s decompress the Zstandard tarballs:


My machine is based on an AMD FX8350 octa-core processor, and we can clearly see that by comparing real and user time, the test is mostly I/O bound. I’ve repeated those test with other multi-threaded tools as shown in the summary table below.

CompressionDecompressionFile Size (bytes)Compression Ratio
ToolsTime (s)“User” Time (s)Time (s)“User” Time (s)
ztsd130.05691.60845.12421.261,881,020,7441.48
pzstd58.92986.5638.17523.391,883,697,2961.48
lbzip284.216353.8437.109167.4161,855,837,3451.50
pigz61.121121.33234.3615.261,903,915,3721.47
pixz177.5961233.8836.2478.1161,782,756,5241.57
pzstd -19275.3611939.53626.8521.8321,794,035,5521.56

I’ve included both “real time” and “user time”, as the latter shows how much CPU time the task has spent on all the cores of the system. If user time is large that means the task required lots of CPU power, and if a task completes in about the same amount of “real time”, but a lower “user time”, it means it was likely more efficient, and consumes less power. pigz is the multi-threaded version of xz algorithm relying on lzma compression which delivers a high compression ratio, at the expense of longer compression time, so I also run pzstd with level 19 compression to compare:


Zstandard compression ratio is similar to the one of lbzip2 with default settings, but compression is quite faster, and much more power efficient. Compared to gzip, (p)zstd offers a better compression ratio, against with default settings, and somewhat comparable performance. pixz offers the best compression ratio, but takes a lot more time to compress, and uses more resources to decompress compared to Zstandard and Pigz. Pzstd with compression level 19 takes even more time to compress, and is getting close to pixz compression, but has the advantage of being much faster to decompress.

Support CNX Software - Donate via PayPal or cryptocurrencies, become a Patron on Patreon, or buy review samples
Advertisements
Subscribe
Notify of
guest
12 Comments
oldest
newest most voted
Advertisements