Search code examples
javajsongziphttpresponsepacketbeat

Java decompress HTTP GZIP content from json attribute


We are working with packetbeat, a network packet analyzer tool to capture http requests and http responses. Packebeat persists this packet events in json format. The problem comes when the server supports gzip compression, packetbeat could not unzip content and save it directly the gzip content as json attribute. As you can see (Note: json has been simplified);

{
 {
     ... ,
     "content-type":"application/json;charset=UTF-8",
     "transfer-encoding":"chunked",
     "content-length":6347,
     "x-application-context":"proxy-service:pre,native:8080",
     "content-encoding":"gzip",
     "connection":"keep-alive",
     "date":"Mon, 18 Dec 2017 07:18:23 GMT"
 },
 "body": "\u001f\ufffd\u0008\u0000\u0000\u0000\u0000\u0000\u0000\u0003\ufffd]k\ufffd\u0014DZ\ufffd/\ufffdYI\ufffd#\ufffd*\ufffdo\ufffd\ufffd\ufffd\u0002\t\u0010^\ufffd\u001c\u000eE=\ufffd{\ufffdb\ufffd\ufffdE\ufffd\ufffdC\ufffd\ufffdf\ufffd,\ufffd\u003e\ufffd\ufffd\ufffd\u001ef\u001a\u0008\u0005\ufffd\ufffdg\ufffd\ufffd\ufffdYYU\ufffd\ufffd;\ufffdoN\ufffd\ufffd\ufffdg\ufffd\u0011UdK\ufffd\u0015\u0015\ufffdo\u000eH\ufffd\u000c\u0015Iq\ndC\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd ... "
}

We are thinking in preprocess packet json files to unzip content. Could someone tell me what i need to decompress zipped "body" json attribute using java?


Solution

  • Your data is irrecoverably broken. Generally I would suggest using Base64 encoding for transferring binary data packed into JSON, but you can read about possible alternatives in Binary Data in JSON String. Something better than Base64 if you like experimenting.

    Otherwise, in theory you could just use a variant of String.getBytes() to get an array of bytes, and wrap the result into the mentioned (in the other answer) streams:

    byte bodyBytes[]=body.getBytes();
    ByteArrayInputStream bais=new ByteArrayInputStream(bodyBytes);
    GZipInputStream gis=new GZipInputStream(bais);
    <do something with gis here, perhaps use an additional DataInputStream>
    

    Apart from the String-thing (which is usually not a good idea), this is how you unpack a gzip-compressed array of bytes.
    However valid gzip data starts with a magic number 0x1F,0x8B (see Wikipedia, or you can also dig up the actual specification). Your data starts with 0x1F (the \u001F part), but continues with a \ufffd Unicode character, which a replacement character (see Wikipedia again).
    Some tool was encoding the binary data and did not like the 0x8B, most probably because it was >=0x80. If you further read in your JSON, there are many \ufffd-s in it, all values above (or equal to) 0x80 have been replaced with this. So the data at the moment is irrecoverably broken even if JSON would support raw binary data inside (but it does not).