Search code examples
deflatedotnetzipdeflatestream

DeflateStream compress/decompress inconsitency


I have the following data from a photoshop file that uses zip-compression (RFC1951):

250, 255, 159, 1, 47, 248, 63, 42, 63, 172, 229, 1, 2, 12, 0, 209, 255, 31, 225

Which decompresses to the following, x16:

255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

Re-compressing this gives me:

251, 255, 159, 1, 47, 248, 63, 42, 63, 172, 229, 1

Why isn't this exactly the same as the original input?

(originally posted on codeplex but got no answers: http://dotnetzip.codeplex.com/discussions/406943)


Solution

  • First, to get the terminology right, RFC 1951 is the deflate format (which your data is), not "zip-compression". zip can use deflate, but the deflate data is then wrapped with zip headers, trailers, and a directory.

    Second, in general there is never any assurance that decompression-compression will always give you the same thing. Most compressors have different levels of compression and other options that can give different compressed output for the same input. The only thing guaranteed by a lossless compressor is that compression-decompression will give you the same thing.

    For your particular example, the first compressor threw in some extraneous empty blocks (two of them). That deflate stream disassembled:

    static
    literal 255 255 0
    match 29 1
    literal 255
    match 258 32
    match 221 32
    end
    !
    static
    end
    !
    last
    static
    end
    

    The second compressor did not include the extraneous empty blocks:

    last
    static
    literal 255 255 0
    match 29 1
    literal 255
    match 258 32
    match 221 32
    end