I have this piece of C#
code:
public static byte[] TestGzip(string text)
{
byte[] bytes = Encoding.UTF8.GetBytes(text);
MemoryStream memoryStream1 = new MemoryStream();
using (GZipStream gzipStream = new GZipStream(memoryStream1, CompressionMode.Compress, true))
gzipStream.Write(bytes, 0, bytes.Length);
memoryStream1.Position = 0L;
byte[] buffer = new byte[memoryStream1.Length];
memoryStream1.Read(buffer, 0, buffer.Length);
return buffer;
}
and I wanted to reproduce this code in JavaScript
so I tried pako and node.js zlib.
Here's how their output is slightly different than the GZipStream
and each other:
const zlib = require('zlib');
const pako = require('pako');
const cc = str => [...str].map(c => c.charCodeAt(0) & 255);
// C# (this is what I want)
Program.TestGZip("a") // [31, 139, 8, 0, 0, 0, 0, 0, 4, 0, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
// JS
pako.gzip("a") // [31, 139, 8, 0, 0, 0, 0, 0, 0, 3, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0] Uint8Array(21)
pako.gzip([97]) // same...
pako.gzip(new Uint8Array([97])) // same...
pako.gzip(cc("a")) // same...
zlib.gzipSync("a") // [31, 139, 8, 0, 0, 0, 0, 0, 0, 10, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0] Buffer(21)
zlib.gzipSync(new Uint8Array([97])) // same...
I also tried some different options of pako
and zlib
, and while with some options the result was different, it never matched the C#
result:
// different options
zlib.gzipSync("a", {level: 1}) // [31, 139, 8, 0, 0, 0, 0, 0, 4, 10, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
zlib.gzipSync("a", {level: 9}) // [31, 139, 8, 0, 0, 0, 0, 0, 2, 10, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
zlib.gzipSync("a", {strategy: 2|3}) // [31, 139, 8, 0, 0, 0, 0, 0, 4, 10, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
pako.gzip("a", {level: 1}) // [31, 139, 8, 0, 0, 0, 0, 0, 4, 3, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
pako.gzip("a", {level: 9}) // [31, 139, 8, 0, 0, 0, 0, 0, 2, 3, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
pako.gzip("a", {strategy: 2|3}) // [31, 139, 8, 0, 0, 0, 0, 0, 4, 3, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
So what should I do?
Why there are these slight differences?
How can I achieve the exact GZipStream.Write()
output?
fix (thanks to @Sebastian):
pako.gzip("a", {strategy: 2, header:{os: 0}})
pako.gzip("a", {strategy: 3, header:{os: 0}})
// weirdly enough, just passing an empty header object works as well:
pako.gzip("a", {strategy: 2, header:{}})
pako.gzip("a", {strategy: 3, header:{}})
// all outputs are exactly like GZipStream.Write():
// [31, 139, 8, 0, 0, 0, 0, 0, 4, 0, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
Looks like the libraries differ in the way they encode the header:
From http://www.onicos.com/staff/iz/formats/gzip.html
Offset Length Contents
...
8 1 byte extra flags (depend on compression method)
9 1 byte OS type
So they simply declare a different OS Type (TOPS-20?!, Unix, FAT). You will probably have to patch the JS libraries to output "FAT" as OS, if you really want that.
Looking at the pako sources, you can probably change the values to your liking and there is also a hint was to what the "extra flags" is used for: From Github:
put_byte(s, s.level === 9 ? 2 :
(s.strategy >= Z_HUFFMAN_ONLY || s.level < 2 ?
4 : 0));
put_byte(s, s.gzhead.os & 0xff);
Adjust the level and strategy, as well as the os header field and you should be good to go!