Arm servers are already deployed in some datacenters, but they are pretty new compared to their Intel counterparts, so at this stage software may not always be optimized as well on Arm as on Intel. Vlad Krasnow working for Cloudflare found one of those unoptimized cases when testing out Jpegtran – a utility performing lossless transformation of JPEG files – on one of their Xeon Silver 4116 Server:
1 2 3 4 5 |
vlad@xeon:~$ time ./jpegtran -outfile /dev/null -progressive -optimise -copy none test.jpg real 0m2.305s user 0m2.059s sys 0m0.252s |
and comparing it to one based on Qualcomm Centriq 2400 Arm SoC:
1 2 3 4 5 |
vlad@arm:~$ time ./jpegtran -outfile /dev/null -progressive -optimise -copy none test.jpg real 0m8.654s user 0m8.433s sys 0m0.225s |
Nearly four times slower on a single core. Not so good, as the company aims for at least 50% of the performance since the Arm processor has double the number of cores. Vlad did some optimization on The Intel processor using SSE instructions before, so he decided to look into optimization the Arm code with NEON instructions instead. First step was to check which functions may slowdown the […]