I am trying to run what I expect is a very common use case:
I need to download a gzip file (of complex JSON datasets) from Amazon S3, and decompress(gunzip) it in Javascript. I have everything working correctly except the final 'inflate' step.
I am using Amazon Gateway, and have confirmed that the Gateway is properly transferring the compressed file (used Curl and 7-zip to verify the resulting data is coming out of the API). Unfortunately, when I try to inflate the data in Javascript with Pako, I am getting errors.
Here is my code (note: response.data is the binary data transferred from AWS):
apigClient.dataGet(params, {}, {})
.then( (response) => {
console.log(response); //shows response including header and data
const result = pako.inflate(new Uint8Array(response.data), { to: 'string' });
// ERROR HERE: 'buffer error'
}).catch ( (itemGetError) => {
console.log(itemGetError);
});
Also tried a version to do it splitting the binary data input into an array by adding the following before the inflate:
const charData = response.data.split('').map(function(x){return x.charCodeAt(0); });
const binData = new Uint8Array(charData);
const result = pako.inflate(binData, { to: 'string' });
//ERROR: incorrect header check
I suspect I have some sort of issue with the encoding of the data and I am not getting it into the proper format for Uint8Array to be meaningful.
Can anyone point me in the right direction to get this working?
For clarity:
The original file was compressed in Java using GZIPOutputStream with UTF-8 and then stored as a static file (i.e. randomname.gz).
The file is transferred through the AWS Gateway as binary, so it is exactly the same coming out as the original file, so 'curl --output filename.gz {URLtoS3Gateway}' === downloaded file from S3.
I had the same basic issue when I used the gateway to encode the binary data as 'base64', but did not try a whole lot around that effort, as it seems easier to work with the "real" binary data than to add the base64 encode/decode in the middle. If that is a needed step, I can add it back in.
I have also tried some of the example processing found halfway through this issue: https://github.com/nodeca/pako/issues/15, but that didn't help (I might be misunderstanding the binary format v. array v base64).
I was able to figure out my own problem. It was related to the format of the data being read in by Javascript (either Javascript itself or the Angular HttpClient implementation). I was reading in a "binary" format, but it was not the same as that recognized/used by pako. When I read the data in as base64, and then converted to binary with 'atob', I was able to get it working. Here is what I actually have implemented (starting at fetching from the S3 file storage).
1) Build AWS API Gateway that will read a previously stored *.gz file from S3.
At this point you should be able to download a base64 verion of your binary file via the URL (test in browser or with Curl).
2) I then had the API Gateway generate the SDK and used the respective apiGClient.{get} call.
3) Within the call, translate the base64->binary->Uint8 and then decompress/inflate it. My code for that:
apigClient.myDataGet(params, {}, {})
.then( (response) => {
// HttpClient result is in response.data
// convert the incoming base64 -> binary
const strData = atob(response.data);
// split it into an array rather than a "string"
const charData = strData.split('').map(function(x){return x.charCodeAt(0); });
// convert to binary
const binData = new Uint8Array(charData);
// inflate
const result = pako.inflate(binData, { to: 'string' });
console.log(result);
}).catch ( (itemGetError) => {
console.log(itemGetError);
});
}