I'm trying to understand how deflate works. To wit I thought I'd try to decode a deflate encoded string by hand going off of what RFC1951: DEFLATE Compressed Data Format Specification version 1.3 says. I generated the deflate encoded string thusly:
$result = gzdeflate('A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED');
echo bin2hex($result);
That gave me this:
1589c11100000cc166a3cc61ff2dca237709880c45e52c2b08eb043dedb78db8851e
From RFC1951 § 3.2.3. Details of block format:
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
...
BFINAL is set if and only if this is the last block of the data
set.
BTYPE specifies how the data are compressed, as follows:
00 - no compression
01 - compressed with fixed Huffman codes
10 - compressed with dynamic Huffman codes
11 - reserved (error)
0x15 is 0b00010101 in bits. The first 3 bits are 000, which would make BFINAL 0 and BTYPE 00 (no compression).
From later in RFC1951 § 3.2.3. Details of block format:
if stored with no compression
skip any remaining bits in current partially
processed byte
read LEN and NLEN (see next section)
copy LEN bytes of data to output
From RFC1951 § 3.2.4. Non-compressed blocks (BTYPE=00):
Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information:
0 1 2 3 4...
+---+---+---+---+================================+
| LEN | NLEN |... LEN bytes of literal data...|
+---+---+---+---+================================+
LEN is the number of data bytes in the block. NLEN is the
one's complement of LEN.
So per all that I move to the next byte - 0x89, which is 0b10001001 in bits. The first two bits (LEN) are 10 and the next two bits are NLEN (00). But herein lies the problem. 00 isn't the one's complement of 10 - 01 is.
Also, what if you wanted to have an uncompressed block with 0xFF as the first byte? Is that just not possible?
You didn't read this part of the RFC:
Data elements are packed into bytes in order of increasing bit number within the byte, i.e., starting with the least-significant bit of the byte.
The first three bits are 101
, not 000
.
A decoding of that stream using infgen:
! infgen 2.6 output
!
last
dynamic
count 259 10 16
code 17 4
code 18 3
code 0 4
code 4 3
code 3 1
code 2 3
zeros 65
lens 3 3 4 3 3
zeros 25
lens 3
zeros 138
zeros 22
lens 4 3 3
zeros 3
lens 2 0 0 2 2 3 3
! litlen 65 3
! litlen 66 3
! litlen 67 4
! litlen 68 3
! litlen 69 3
! litlen 95 3
! litlen 256 4
! litlen 257 3
! litlen 258 3
! dist 3 2
! dist 6 2
! dist 7 2
! dist 8 3
! dist 9 3
literal 'A_DEAD_D
match 3 4
literal 'CEDED_A_B
match 3 12
literal 'BABE
match 4 11
match 3 28
match 4 20
literal 'BAC
match 4 13
literal 'D
end