algorithm encoding compression gzip deflate

Deflate compression - numerical example

I am really interested to see a numerical example how deflate compression works, by hand.

The following very short text 'abc' has been compressed using the deflate algorithm outputting 'eJxLTEoGAAJNASc=' which in binary notation is:

01100101 01001010 01111000 01001100 01010100 01000101 01101111 01000111 01000001 01000001 01001010 01001110 01000001 01010011 01100011 00111101

Can anyone help show how the bit-counting steps work, by hand, that will decode this string of 0 and 1 as the original string 'abc', please?

Thank you!

Solution

Your binary dump is of the Base64 string you provided, not the actual binary compressed data. That data is, in hex:

78 9c 4b 4c 4a 06 00 02 4d 01 27

Or in binary:

01111000 10011100 01001011 01001100 01001010 00000110 00000000 00000010 01001101 00000001 00100111

You can use infgen to disassemble deflate streams. Your data is actually a zlib wrapper around a deflate stream:

! infgen 2.5 output
!
zlib
!
last                    ! 1
fixed                   ! 01
literal 'a              ! 10010001
literal 'b              ! 10010010
literal 'c              ! 10010011
end                     ! 0000000
                        ! 000000
!
adler

The deflate format is documented in RFC 1951, and the zlib wrapper in RFC 1950.

The first two bytes are the zlib header. Then the low bits of the next byte are 011, where the low 1 says that this is the last block, and the 01 above that says that this is a fixed block. Note that the bits are read from least significant to most significant (bottom up). The remaining bits of the five bytes in the deflate data are the symbols a, b, and c, and the end-of-block symbol. Those are followed by the four-byte Adler-32 check value of the uncompressed data.

This is a pretty boring example, since it's so short. You'll need a much longer example to make use of dynamic blocks, so that you can explore a dynamic block header in its full glory.