Search code examples
node.jshttpzlib

How can I decompress gzipped http bodies in a nodejs application?


First off, I tried all that I could find anywhere on the web, including the answers to related questions on SO, but could not make anything work. That's why I'm asking a new question.

I use the Lua filter on envoy to send headers and body chunks to a simple node application which just writes them to files. In the node application I receive headers and body chunks together with a http transaction index and whether they belong to the request or the response, and am therefore able to reconstruct the request and the response in their entirety.

On occasion, I get a response sent from the Lua filter that is gzipped. No matter how I try to employ node's zlib library, I always get the same error when trying to decompress it.

For example, these being the response contents (body truncated and irrelevant headers removed):

content-encoding: gzip
content-type: application/json
transfer-encoding: chunked

�ėmo ...

When I remove the headers and the newlines before the body starts that I know I add myself, when writing the file, and try to decompress the remaining part with:

let compressed = fs.readFileSync(...);
zlib.gunzip(compressed, function(err, uncompressed) {
    console.log(uncompressed);
    if (err) {
        console.log("Error: " + err.message);
    }
});

I always get this error:

Error: incorrect header check

I get the same error with code like this:

var buffer = [];
gunzip.on('data', function(data) {
    buffer.push(data.toString())
}).on("end", function() {
    console.log(buffer.join(""));
}).on("error", function(e) {
    console.log("Error: " + e.message);
})
let compressed = fs.readFileSync(...);
gunzip.write(compressed);

If I capture the same request with tcpdump, I can see that what I get in the tcp stream and what's in the data sent from the Lua filter to node is similar. However, if I load the tcp dump in wireshark, it will properly decompress the data when I ask it to follow the http stream, so I know the data is valid and there is a way to decompress it.

What's the correct way to decompress chunks in a gzip-ed http body using JavaScript running on node?


Solution

  • Long story short, PEBCAK - sort of. But I will leave this here in case someone else runs into a similar problem.

    What I started out with was something like this - given that I was only expecting plain text data:

    let data = "";
    request.on("data", chunk => {
        data += chunk.toString();
    });
    request.on("end", chunk => {
        if (chunk) {
            data += chunk.toString();
        }
        doSomethingWithTheBinaryData(data);
    }
    

    This, however, was pushing binary data into strings, causing the binary content to be modified, and zlib to no longer be able to decompress it. The proper way to receive binary data is:

    A proper way to receive binary data from an http request is something like:

    let data = [];
    request.on("data", chunk => {
        data.push(chunk);
    });
    request.on("end", chunk => {
        if (chunk) {
            data.push(chunk);
        }
        allData = Buffer.concat(data);
        doSomethingWithTheBinaryData(allData);
    }
    

    Once I switched to this second form, both text and compressed data started to be received and processed correctly.