Search code examples
javascriptc#zlibgzipstreampako

How to replicate GZipStream.Write() in JavaScript?


I have this piece of C# code:

public static byte[] TestGzip(string text)
    {
        byte[] bytes = Encoding.UTF8.GetBytes(text);
        MemoryStream memoryStream1 = new MemoryStream();

        using (GZipStream gzipStream = new GZipStream(memoryStream1, CompressionMode.Compress, true))
            gzipStream.Write(bytes, 0, bytes.Length);

        memoryStream1.Position = 0L;
        byte[] buffer = new byte[memoryStream1.Length];
        memoryStream1.Read(buffer, 0, buffer.Length);

        return buffer;
    }

and I wanted to reproduce this code in JavaScript so I tried pako and node.js zlib.
Here's how their output is slightly different than the GZipStream and each other:

const zlib = require('zlib');
const pako = require('pako');
const cc = str => [...str].map(c => c.charCodeAt(0) & 255);

// C# (this is what I want)
Program.TestGZip("a")                 // [31, 139, 8, 0, 0, 0, 0, 0, 4, 0, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]

// JS
pako.gzip("a")                        // [31, 139, 8, 0, 0, 0, 0, 0, 0, 3, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0] Uint8Array(21)
pako.gzip([97])                       // same...
pako.gzip(new Uint8Array([97]))       // same...
pako.gzip(cc("a"))                    // same...

zlib.gzipSync("a")                    // [31, 139, 8, 0, 0, 0, 0, 0, 0, 10, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0] Buffer(21)
zlib.gzipSync(new Uint8Array([97]))   // same...

I also tried some different options of pako and zlib, and while with some options the result was different, it never matched the C# result:

// different options
zlib.gzipSync("a", {level: 1})        // [31, 139, 8, 0, 0, 0, 0, 0, 4, 10, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
zlib.gzipSync("a", {level: 9})        // [31, 139, 8, 0, 0, 0, 0, 0, 2, 10, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
zlib.gzipSync("a", {strategy: 2|3})   // [31, 139, 8, 0, 0, 0, 0, 0, 4, 10, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]

pako.gzip("a", {level: 1})            // [31, 139, 8, 0, 0, 0, 0, 0, 4, 3, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
pako.gzip("a", {level: 9})            // [31, 139, 8, 0, 0, 0, 0, 0, 2, 3, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
pako.gzip("a", {strategy: 2|3})       // [31, 139, 8, 0, 0, 0, 0, 0, 4, 3, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]

So what should I do?
Why there are these slight differences?
How can I achieve the exact GZipStream.Write() output?

fix (thanks to @Sebastian):

pako.gzip("a", {strategy: 2, header:{os: 0}})
pako.gzip("a", {strategy: 3, header:{os: 0}})

// weirdly enough, just passing an empty header object works as well:
pako.gzip("a", {strategy: 2, header:{}})
pako.gzip("a", {strategy: 3, header:{}})

// all outputs are exactly like GZipStream.Write():
// [31, 139, 8, 0, 0, 0, 0, 0, 4, 0, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]

Solution

  • Looks like the libraries differ in the way they encode the header:

    From http://www.onicos.com/staff/iz/formats/gzip.html

    Offset   Length   Contents
     ...
      8      1 byte   extra flags (depend on compression method)
      9      1 byte   OS type
    

    So they simply declare a different OS Type (TOPS-20?!, Unix, FAT). You will probably have to patch the JS libraries to output "FAT" as OS, if you really want that.

    Looking at the pako sources, you can probably change the values to your liking and there is also a hint was to what the "extra flags" is used for: From Github:

    put_byte(s, s.level === 9 ? 2 :
                        (s.strategy >= Z_HUFFMAN_ONLY || s.level < 2 ?
                         4 : 0));
    put_byte(s, s.gzhead.os & 0xff);
    

    Adjust the level and strategy, as well as the os header field and you should be good to go!