I noticed there is an important performance difference between the zlib library available in the system, and the one I re-installed from source, although both are zlib version 1.2.11. I run Mac OS 10.13.6.
#include <stdio.h>
#include <stdlib.h>
#ifdef LOCAL_ZLIB
#include "./zlib-1.2.11/zlib.h"
#else
#include <zlib.h>
#endif
int main(int argc, char *argv[])
{
printf("zlib version %s\n",zlibVersion());
gzFile testFile = gzopen(argv[1], "r");
int buffsize = 1024*1024 ;
char * buffer = (char *) calloc(buffsize,sizeof(char));
while ( gzread(testFile,buffer,buffsize) >0 )
{
;
}
free(buffer);
gzclose(testFile);
}
It just decompress the file using gzread in the buffer.
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR374/006/SRR3744956/SRR3744956_1.fastq.gz
gcc bench_zlib.c -O3 -o bench_zlib -lz
time ./bench_zlib SRR3744956_1.fastq.gz
Which gives :
zlib version 1.2.11
real 0m3.711s
user 0m3.599s
sys 0m0.105s
zlib recompiled, same version, linked in static mode :
wget https://www.zlib.net/zlib-1.2.11.tar.gz
tar -xzvf zlib-1.2.11.tar.gz
cd zlib-1.2.11
./configure
make
cd..
gcc bench_zlib.c ./zlib-1.2.11/libz.a -O3 -o bench_zlib -DLOCAL_ZLIB
time ./bench_zlib SRR3744956_1.fastq.gz
Which gives
zlib version 1.2.11
real 0m5.236s
user 0m5.113s
sys 0m0.112s
The version I re-compiled locally from sources is 40 % slower. Any explanation ?
Things are already checked :
Is it possible the system version is compiled with some special options that make it faster ? ( but 40 % seems a lot, and the zlib library is compiled with -O3 mode already)
As pointed by Mark Adler in his comment, the code used in the macOS library must be different. The confusion comes from the fact that they did not change the library version string.
I guess they use something similar to this version https://github.com/jtkukunas/zlib (1.2.11.1-motley), where the CRC computation are vectorized. Profiling showed that crc function is 9X faster in the apple zlib version compared to zlib 1.2.11. This performance is similar to zlib "1.2.11.1-motley".
On a 4GB gzipped file, I have the following decompression times
apple zlib 1.2.11 (dynamic zlib included in Mac OS 10.13.6) : 47.9 s
vanilla zlib 1.2.11 (from zlib.net) : 70.8 s
zlib 1.2.11.1-motley (from github.com/jtkukunas/zlib) : 48.4 s
Moreover, when using gzbuffer(testFile, 1 << 20);
which increases the zlib buffer to 1MB, the apple zlib becomes a little bit faster than zlib 1.2.11.1-motley.
apple zlib 1.2.11 : 43.9 s
vanilla zlib 1.2.11 : 67.1 s
zlib 1.2.11.1-motley : 48.3 s
So I guess that on top of the vectorized CRC, they also have some other optimizations.