May 24, 2019May 24, 2019 by Jean-Luc Aufranc (CNXSoft) - 15 Comments

Google Pik Image Format Improves on Lossy JPEG and Lossless PNG

JPEG lossy compression is still used on most photos in the Internet, while PNG is still the preferred format for lossless compressions. Back in 2010, Google unveiled WebP to improve on both, but that’s only very recently that I started to see a few webp image on the Internet.

The company has been working on yet another image for with Pik lossy/lossless image format designed for high quality and fast decoding.

Google Pik butteraugli — Butteraugli Heat Map used by Google Pik

Some of the features enabling high quality:

Built-in support for psychovisual modeling via adaptive quantization and XYB color space
4×4..32×32 DCT, AC/DC predictors, chroma from luma, nonlinear loop filter, enhanced DC precision
Full-precision (32-bit float) processing, plus support for wide gamut and high dynamic range

Features allowing faster decoding over 1 GB/s multi-threaded:

Parallel processing of large images
SIMD/GPU-friendly, benefits from SSE4 or AVX2
Cache-friendly layout
Fast and effective entropy coding: context modeling with clustering, rANS

Google Pik is royalty-free, and is said to achieve perceptually lossless encodings at about 40% of the JPEG bitrate, and store fully lossless at about 75% of 8-bit PNG size, or 60% of 16-bit PNG size.

The SIMD readme explains Pik leverages SSE4, AVX2, and ARMv8 instructions to improve performance.

You can give it a try by checkout the source code on Github. It should be noted that it’s pretty hard to change standards on the Internet, as shown by WebP, HEIF, and FLIF projects all of which are said to be technically superior to JPEG.

Via WorksonArm

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

ROCK 5 ITX Rockchip RK3588 mini-ITX motherboard

Name*

Email*

Website

I agree to the Privacy Policy

The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.

Name*

Email*

Website

I agree to the Privacy Policy

The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.

15 Comments

oldest

newest

tkaiser

5 years ago

> Google Pil is royalty-free

Typo.

And for those looking for a way to get smaller file sizes for images on the internet without (too much) visual harm look at https://tinypng.com — the methods how they shrink PNG and JPEG images are explained there (for JPEG at tinyjpg.com).

jqpabc123

5 years ago

tkaiser

Re: tinypng and tinyjpg Their methods are very rudimentary yet still very slooow. PNG – They appear to be using color reduction — combining similar colors. JPG – They appear to be simply removing metadata that a typical camera/phone places in image files. Aside from this, they utilize a very flowery description of what is already inherent in the JPEG algorithm. At the same time, they lack the ability to re-scale images which is crucial for images on the internet. An image file from your phone has millions of pixels — which makes sense if you intend to print a… Read more »

tkaiser

5 years ago

jqpabc123

They remove metadata (which can be quite huge with images dropping out of Photoshop for example) but with PNG they do not ‘reduce’ color but switch to an optimized indexed color space. Wrt JPEG it’s explained what they do (JPEG allows for different compression schemes in different ‘parts’ of the image). As for imageoptimizer.net I would believe the only benefit here is stripping (some) metadata and downscaling and reapplying different compression levels. Also it seems it doesn’t support PNG transparency (back to IE6 ‘capabilities’? Seriously?!) At least for screenshots I really don’t want downscaling and a quick test reveals that… Read more »

Author

Jean-Luc Aufranc (CNXSoft)

5 years ago

tkaiser

For reference, I’m using EWWW Image Optimizer plugin on this site https://wordpress.org/plugins/ewww-image-optimizer/ that optimize PNG and JPG images that I upload. They also have a (paid) cloud-based solution @ https://ewww.io/

blu

5 years ago

Also, there’s this nice simd ISA matrix in the simd seciton of the project that I find quite useful: https://github.com/google/pik/blob/master/pik/simd/instruction_matrix.pdf

-.-

5 years ago

blu

Is that for their wrapper implementation though? For example, you can definitely do pairwise addition between 8-bit values with one instruction on x86 (pmaddubsw), but the table indicates that it takes 9. Another example is pairwise subtraction on ARM – it definitely doesn’t take 9 instructions to do.
I do see a lot of “9”s on the page – maybe that just means that it’s not implemented in SIMD at all?

It looks better if you ignore the “9”s, but still seems questionable, e.g. integer negation on x86 (8/16/32-bit) is one instruction (psign*), but listed as two on the table.

blu

5 years ago

-.-

Yes, it appears to be for their wrapper. And yes, it’s a fuzzy table indeed, where ‘9’ appears to indicate ‘a few’. I haven’t gone over all cells, but those that I’ve glanced at seemed ok. BTW, pmaddubsw does 16-bit saturation of the pairwise results, so it’s not exactly one instruction that does the 8-bit pairwise job, like addp on asimd2 does. Re int negation in ssse3 — it appears they count the mask load as a separate op.

-.-

5 years ago

blu

> BTW, pmaddubsw does 16-bit saturation of the pairwise results 8-bit addition cannot exceed the range of a 16-bit signed int, so saturation doesn’t apply here. The semantics are different to ARM’s addp in that it widens the result – it’s more like *addlp, though I’m sure emulating addp wouldn’t take 9 instructions. > Re int negation in ssse3 — it appears they count the mask load as a separate op. I suppose that’s a fair point, although it rarely applies in practice (since you can hold a register of all 1’s and just re-use that). Though by that definition,… Read more »

blu

5 years ago

-.-

> 8-bit addition cannot exceed the range of a 16-bit signed int, so saturation doesn’t apply here. The semantics are different to ARM’s addp in that it widens the result – it’s more like *addlp, though I’m sure emulating addp wouldn’t take 9 instructions. Saturation doesn’t apply, but the widening does, so it’s not a single op, was my point. To get the effect of asimd2’s addp you’d need something along: pmaddubsw v0, v_ones pmaddubsw v1, v_ones pand v0, v_low_bytes_mask pand v1, v_low_bytes_mask packuswb v0, v1 which even without the multiplier/mask loading is still a far cry from a single… Read more »

-.-

5 years ago

blu

> Add to the fact how the table seems to count const loads as extra ops, and that operation goes beyond ‘5’ and into the ‘9’ category. I count 7 from your implementation, so I was right about it definitely not being 9. > For ints the natural LE/GE solution is exactly two ops: LE == GT + not According to the table, GT is 1 op, NOT is 2 ops, so GT+NOT = 3 ops, not 2. You do bring up a good point with MAX+EQ though – I did not consider that. Can you think of a 2… Read more »

blu

5 years ago

-.-

> I count 7 from your implementation, so I was right about it definitely not being 9. Many of the ops there are not 9 ops in either x86 or arm. I believe I’ve already explained my interpretation of the table, to which you disagree. So we agree to disagree. I’ll just comment on a couple of things below. > According to the table, GT is 1 op, NOT is 2 ops, so GT+NOT = 3 ops, not 2. You’re right — with the const load that would indeed account to 3 (as in the not case accounting to 2,… Read more »

-.-

5 years ago

blu

> You _really_ don’t wont to mix sse and avx ops on an amd64 It was mostly to save typing a third parameter, since you get the idea anyway and the compiler would never mix the two. Though I really see no problem with it since there’s no transition penalty between 128b instructions, and SSE encodings are often shorter than VEX encodings. > avx would save the move I derped when I thought of the implementation, but the move isn’t necessary for SSE: pxor xmm0, xmm0 pcmpgtq xmm0, v paddq v, xmm0 pxor v, xmm0 So 4 ops (assuming you… Read more »

blu

5 years ago

-.-

My narrative? I’ve explained my ‘narrative’ about three posts ago. Judging by your last response, you’re yet to catch up with that. > I derped when I thought of the implementation, but the move isn’t necessary for SSE: Congratulations on shaving off the move from the sane implementation. Perhaps the authors have derped as well in some of their mappings? It’s funny when we hold others to high standards, when we can just derp and come around on a high horse later. > I claim that 1+2=2. I make no claims in regards to the precision of the answer. Do… Read more »

-.-

5 years ago

blu

> I see the table bugs you a lot It actually doesn’t. Your claim that it’s useful is what bugs me a lot, even when a number of mistakes were pointed out. Or rather, the endorsement promoting misleading information being disseminated is what bugs me a lot. Wrong information is not useful in my book. “1+2=2” is wrong, and hence not a useful statement in my book. Clearly you think otherwise, which is fine I suppose, but methinks that you may be alone with that view. (note that *accurate* statements don’t have to be precise: “1+2 is between 2 and… Read more »

dave

5 years ago

PIK is merely a research project. PIK, avif and jpeg have joined together to make the Jpeg XL image format which will be ready later this year.