Search code examples
javascriptajaxcontent-encoding

How can content-encoding be ignored


I have a device I need to download a file from. In certain cases, the file may have an incorrect content-encoding. Particularly, it may have a content-encoding of "gzip", when it is not gzipped, or compressed in any way.

So, when the file is gzipped, it's simple to get the content using a basic ajax GET:

$.ajax({
    url: 'http://' + IP + '/test.txt',
    type: 'GET'
})
.done(function(data) {
    alert(data);
});

But this fails, as you might expect, when the content-encoding is wrong.

To be clear, I'm not looking for a solution to bypass the ERR_CONTENT_DECODING_FAILED when simply navigating to the given url in a browser. I want to be able to load, for instance, a csv, into a string in javascript for further parsing.

Can I GET the file, and force it to skip attempting decoding, or override the content-encoding of the response, or some such?


Solution

  • This is simply not possible to do via client-side JavaScript, per the WHATWG's XHR spec, which makes use of the fetch operation from the WHATWG Fetch Standard.

    Client-side scripts can only read the response object supplied by the browser environment. The Fetch Standard defines how the browser environment must build a response object's body attribute in step 2 of the fetch operation (note especially substeps 2 through 4):

    1. Whenever one or more bytes are transmitted, let bytes be the transmitted bytes and run these subsubsteps:

      1. Increase response's body's transmitted with bytes' length.

      2. Let codings be the result of parsing Content-Encoding in response's header list.

      3. Set bytes to the result of handling content codings given codings and bytes.

      4. Push bytes to response's body.

    Where the action handling content codings is:

    To handle content codings given codings and bytes, run these substeps:

    1. If codings are not supported, return bytes.

    2. Return the result of decoding bytes with the given codings as explained in HTTP.

    From this definition, we can see that a response object never exposes encoded bytes in its body property. Before bytes can be added to the body, they must first be decoded. A client script never has access to what the spec calls "transmitted bytes" (i.e., the actual encoded bytes sent over the wire).

    Decoding is determined exclusively by the Content-Encoding header. There is no mechanism by which client-side JavaScript can manipulate the response headers of a response object, so Content-Encoding must be whatever the server originally sent.

    What your server is doing is wrong. Your only options are:

    1. Fix the behavior of the server.

    2. Run the HTTP response through a proxy that fixes the Content-Encoding response header before it reaches your client.