I implemented the deflate algorithm as part of a PNG encoder in VHDL. Most of the test inputs are deflated correctly, but there is a problem with two specific inputs and I can't find the cause.
First input are five rows with the content: 0 1 1 1 1 1 1 1 1 1
. I. e. it's a 3x5 RGB image with all zeros and the filter type (=0) prepended.
Second input are three rows with the content: 0 1 1 1 1 1 1 1 1 1 1
. I. e. it's a 5x3 grayscale + alpha image with all zeros and the filter type (=0) prepended.
The output of the simulation is
['0x78', '0x1', '0x63', '0x60', '0x84', '0x1', '0x24', '0x26', '0x12', '0x13', '0x89', '0x9', '0x5', '0x0', '0x4', '0x97', '0x0', '0x2e']
for the first input, respectively
['0x78', '0x1', '0x63', '0x60', '0x84', '0x3', '0x24', '0x16', '0x12', '0xb', '0x0', '0x2', '0x10', '0x0', '0x1f']
for the second input.
For verification, I'm using pythons zlib module:
>>> zlib.adler32(bytes(([0] + [1]*9)*5)).to_bytes(4, "big")
b'\x04\x97\x00.'
>>> deflated_data = bytes([int(b, 16) for b in ['0x78', '0x1', '0x63', '0x60', '0x84', '0x1', '0x38', '0x13', '0xce', '0x84', '0x33', '0xe1', '0x4c', '0x10', '0x0', '0x0', '0x4', '0x97', '0x0', '0x2e']])
>>> zlib.decompress(deflated_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: invalid distance too far back
>>> zlib.adler32(bytes(([0] + [1]*10)*3)).to_bytes(4, "big")
b'\x02\x10\x00\x1f'
>>> deflated_data = bytes([int(b, 16) for b in ['0x78', '0x1', '0x63', '0x60', '0x84', '0x3', '0x24', '0x16', '0x12', '0xb', '0x0', '0x2', '0x10', '0x0', '0x1f']])>>> zlib.decompress(deflated_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect data check
So the adler checksum is correct, but both seem to be invalid deflate streams. Next, I tried to decode the streams myself. Start point is the binary and inverted representation of the streams, which can be read from left to right. Zlib header and adler checksum are cut out. Huffman tables are taken from https://www.ietf.org/rfc/rfc1951.txt.
['11000110', '00000110', '00100001', '10000000', '00100100', '01100100', '01001000', '11001000', '10010001', '10010000', '10100000', '00000000']
110 - block header (last block and fixed huffman code)
00110000 - literal 0 -> 0
00110001 - literal 1 -> 1
0000110 - match length 8
00000 - match distance 1 -> 11111111
--------------------------------
0001001 - match length 11/12
0 - match length 11
00110 - match distance 9-12
01 - match distance 10
-------------------------------- repeat two times -> 01111111110, 11111111101, 11111111011
0000101 - match length 7
0 - match distance 1 -> 1111111
0000000 - end of block
00000 - will be discarded
This yields the original data:
0 1 11111111
01111111110
11111111101
11111111011
1111111
['11000110', '00000110', '00100001', '11000000', '00100100', '01101000', '01001000', '11010000', '00000000']
110 - block header (last block and fixed huffman code)
00110000 - literal 0 -> 0
00110001 - literal 1 -> 1
0000111 - match length 9
00000 - match distance 1 -> 111111111
--------------------------------
0001001 - match length 11/12
0 - match length 11
00110 - match distance 9-12
10 - match distance 11
-------------------------------- repeat one time -> 01111111111, 01111111111
0000000 - end of block
0000 - will be discarded
This yields the original data:
0 1 111111111
01111111111
01111111111
I can't see an invalid distance or any invalid/incorrect data. Can someone point out where my mistake is? Is there maybe a (python) library, which could do the inflate with verbose output?
The first one has a distance too far back. The second one has an Adler-32 mismatch.
infgen output for the first one:
! infgen 2.4 output
!
zlib
!
last
fixed
literal 0 1
match 8 1
match 11 11
infgen warning: distance too far back (11/10)
match 11 11
match 11 11
match 7 1
end
!
adler
What you thought was distance 10 was actually distance 11. That is because the extra bits are not reversed. The extra bits there are 10
, not 01
. Hence distance 11, not 10.
For the second one, infgen gives:
! infgen 2.4 output
!
zlib
!
last
fixed
literal 0 1
match 9 1
match 11 10
match 11 10
end
!
adler
Same problem, but instead thinking it's 10
when it is really 01
. Now the distance isn't too far, but rather not far enough to generate the data you want. Hence an incorrect Adler-32 check.