Brotli Compression Algorithm Combines High Compression Ratio, and Fast Decompression

After Zopfli, Google has now announced and released Brotli, a new compression algorithm to make the web faster, with a ratio compression similar to LZMA, but a much faster decompression, making it ideal for low power mobile devices.

Brotli_vs_lzma_vs_zopfli
Compression Ratio vs Decompression Speed for LZMA, Bzip2, Brotli, Zopfli…

Contrary to Zopfli that is deflate compatible, Brotli is a complete new format, and combines “2nd order context modeling, re-use of entropy codes, larger memory window of past data and joint distribution codes” to achieve higher compression ratios.

Google published some benchmark results comparing Brotli to other common algorithms. Since the company aims to make the web faster, the target is to decrease both downloading time (high compression ratio), and rendering time (fast decompression speed), and Brotli with a quality set to 11 is much better than competitor once both parameters are taken into account.

As you’d expect the source code can also be pulled from Github. So I gave it a quick try myself in an Ubuntu PC, and installed it as follows:


The application usage info is not really detailed, but good enough to get started:


I was a little too optimistic at first, and started by compressing a 2.4GB firmware file using quality 11 with bro, but I quickly found out it would take a very long time, as the compression speed is rather slow… Google white paper shows Brotli compresses 0.5 MB of data per second when quality is set to 11, and they performed the test on a server powered by an Intel Xeon E5-1650 v2 running at 3.5 GHz…

So instead I switched to Linux 4.2 Changelog text file I created a few days ago that’s only 14MB in size.


My machine is based on an AMD FX8350 processor (8 cores @ 4.0 GHz), and it took about 46.6 second to compress the file.
Then I switched to xz which implements LZMA compression, and compression preset 9.


Compression speed is much faster with xz, about 7.5 times faster , but it’s not really a concern for Google, because in their use cases, the file is compressed once, but downloaded and decompressed millions of times. It’s also interesting to note that both tools only used one core to compress the file.

Let’s check the file sizes.


In my case, LZMA compression ratio was also slightly higher than Brotli compression ratio, but that’s only for one file, and Google’s much larger test sample (1,000+ files) shows a slight advantage to Brotli (11) over LZMA (9).

Decompression is much faster than compression in both cases:



Brotli is indeed considerably faster at decompressing than LZMA, about 3 times faster, just as reported in Google’s study.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK Pi 4C Plus
Subscribe
Notify of
guest
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.
20 Comments
oldest
newest
opk
opk
8 years ago

It’s worth noting that, unlike other compression algorithms, brotli uses a static dictionary based on common English, Spanish, HTML etc. That’s not a bad idea but it is sort-of cheating when you’re doing a comparison against other algorithms.

LinAdmin
LinAdmin
8 years ago

@opk

Why do you see it like cheating?
If using a certain dictionary in certain applications proves to be useful, then this is a legitimate approach.

ben
ben
8 years ago

Using the term quality to compress a video file with brotli, kind of thru me off… should be called compression strength or something like that, no?

notzed
notzed
8 years ago

Groan, all i can see is someone rushing to use it for everything all of a sudden and “turning it up to 11”, although that compression speed needs a lot of work.

I also wonder what the decompression memory requirements are (incl fixed overheads) and can it stream the process? And which of those tools isn’t honouring the umask properly – the generated file should have the same permissions.

FWIW try adding overflow:auto to your pre style.

Also a bummer threading isn’t utilised.

-
-
8 years ago

@LinAdmin

LZMA/deflate support custom preset dictionaries. It would’ve probably been fairer if a comparison was made with those algorithms using a similar preset dictionary.

Another thing to note is that Brotli’s max dictionary size is 16MB, which is another facet which makes this algorithm very web focused (as opposed to a more general purpose algorithm like LZMA).

Georgi Marinov (@Sanmayce)

Hi, a good review and good for us all if Brotli manages to boost the browsing. @Jean-Luc Aufranc (CNXSoft) Please run your test with ‘dickens’ (from Silesia Corpus) too, it is one of the most used testdatafiles in compression community for a reason, it represents English texts in its fat book form (2 Bibles). Just use ‘-v’ option in order we all to see what is the Brotli’s decompression speed on your Vishera, on my old Core 2 laptop I reached 145 MB/s, for more info: https://github.com/google/brotli/issues/174 Also, I shared my opinion with Brotli’s team about their current defaults and… Read more »

Georgi Marinov (@Sanmayce)

@Jean-Luc Aufranc (CNXSoft) >I can test it, but could you provide the download link for dickens? Bravo, I very much want to see how “low caliber” CPUs execute them, ‘dickens’ is inhere: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia http://sun.aei.polsl.pl/~sdeor/corpus/dickens.bz2 >A single script that I can run to provide the results you want would be even nicer. Sure, but I am lame in *nix prompt, so an “estimate” for *nix: Step #1: Download the whole package all-in-one: https://mega.nz/#!c0hBkCYQ!U2irEQPbiig6xcUD0mXIVn7P-suzTcku6COKtkvKHmo Step #2: Download the C source: https://software.intel.com/sites/default/files/managed/6d/27/Nakamichi_Shifune.zip Step #3: Compile the C source: ./gcc -O3 -fomit-frame-pointer Nakamichi_Shifune.c -o Nakamichi_Shifune.elf -D_N_XMM -D_N_prefetch_4096 -D_N_Branchfull Step #4: ./Nakamichi_Shifune.elf dickens.Nakamichi /report Sorry… Read more »

Georgi Marinov (@Sanmayce)

@Jean-Luc Aufranc (CNXSoft) Oh, there is difference between GCC Windows and GCC Linux, maybe GCC is not fully installed. Just comment the line #940, SSE4.1 is actually not used just SSE2: 938 #ifdef _N_XMM 939 #include // SSE2 intrinsics //940 #include // SSE4.1 intrinsics 941 #endif It should work this way. I was lazy to make time reporting portable enough: k = (((float)1000*TargetSize/(clocks2 – clocks1 + 1))); k=k>>20; k=k*Trials; printf(“RAM-to-RAM performance: %d MB/s.\n”, k); If clocks_per_sec is not 1000 as in Windows, the MB/s would be unusable. Anyway, thanks for your FX8350 result, 195MB/s is still far from my ~512MB/s… Read more »

Georgi Marinov (@Sanmayce)

@Jean-Luc Aufranc (CNXSoft) Thank you for wrestling with my code 😉 If you still care, just replace the ‘1000’ with ‘CLOCKS_PER_SEC’ in next three lines and voila: //k = (((float)1000*TargetSize/(clocks2 - clocks1 + 1))); k=k>>20; k=k*Trials; k = (((float)CLOCKS_PER_SEC*TargetSize/(clocks2 - clocks1 + 1))); k=k>>20; k=k*Trials; ... //k = (((float)1000*SourceSize/(clocks2 - clocks1 + 1))); //k=k>>10; k = (((float)CLOCKS_PER_SEC*SourceSize/(clocks2 - clocks1 + 1))); //k=k>>10; ... //j = (float)1000*1024*( 512 )/ ((int) durationGENERIC); j = (float)CLOCKS_PER_SEC*1024*( 512 )/ ((int) durationGENERIC); This is the dump from my MinGW console: D:\Internetdownloads\Nakamichi_Shifune>gcc --version gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 5.1.0 Copyright (C) 2015 Free Software Foundation,… Read more »

Georgi Marinov (@Sanmayce)

Oh man, these lilliput machines are so exciting! I know next to nothing about ARM, yet I see that v7 has 32bit registers whereas v8 has already 64bit, this will hurt Shifune’s decoding speed. The price is lovable, just for me. To avoid waiting the ‘memcpy()’ mumbo-jumbo segment, it can be skipped by adding ‘BandwidthFlag=0;’ that way: BandwidthFlag=0; if (BandwidthFlag) { // Benchmark memcpy() [ ... Please run it again, these little scamps are so sweet&cheap. By the way, by chance I have become the author of fastest hash function (for table lookups), called FNV1a-YoshimitsuTRIAD, with supremacy especially on ARM:… Read more »

Georgi Marinov (@Sanmayce)


Ugh, stupid of me, you said that it stuck still decompressing, don’t know what is the cause, maybe the XMM macro is the culprit, you could comment it and put below it standard memcpy() as below:


#ifdef _N_XMM
//SlowCopy128bit( (const char *)( (uint64_t)(srcLOCAL+1) ), retLOCAL );
memcpy(retLOCAL, (const char *)( (uint64_t)(srcLOCAL+1) ), 16);

Being little endian (as x86) it should work.

Georgi Marinov (@Sanmayce)

@Jean-Luc Aufranc (CNXSoft) >I had only compiled the code with: >gcc -O3 -fomit-frame-pointer Nakamichi_Shifune.c -o Nakamichi_Shifune.elf >maybe that’s why it looped. If this was the full compile line no wonder, you cannot omit ‘-D_N_XMM’, it should be like this: gcc -O3 -fomit-frame-pointer Nakamichi_Shifune.c -o Nakamichi_Shifune.elf -D_N_XMM -march=armv7v >Anyway, I’ve done the modifications with memcpy, and built it as on x86, but it will just segfault as I run the program. Never compiled for ARM, but shouldn’t you specify ARM target in GCC command options?! https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html It says that “not all architectures are recognized” automatically: -march=name This specifies the name of… Read more »

Khadas VIM4 SBC