I am trying to upload an excel file to S3 using Node.js and aws-sdk
The input is JSON and I am using XLSX library to convert it to a workbook and upload to S3 using the below code.
const wb = XLSX.utils.book_new();
//Convert JSON to sheet data
const sheetData = XLSX.utils.json_to_sheet(req.body);
XLSX.utils.book_append_sheet(wb, sheetData, 'Sheet 1');
const sheetDataBuffer = XLSX.write(wb, {bookType: 'xlsx', type: 'buffer', bookSST: false});
const s3 = new AWS.S3();
s3.upload({
Key: file,
Bucket: <bucket name>,
Body: sheetDataBuffer,
ContentType: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
ContentEncoding: 'base64'
}).promise()
.then(data => {
logger.debug("uploaded the data");
res.sendStatus(200);
});
}
However, when I view the uploaded file on S3, it shows garbled/corrupt data. What am I missing? Is this a content encoding issue?
Update: It looks like the client side parsing library I am using -- Papaparse -- parses the excel content as garbled. I tried setting the encoding option to 'utf-8' but it doesn't help. Any idea what am I missing?
Papa.parse(e.target.files[0], {
encoding: 'utf-8',
download: true,
complete: async data => {
// data.data is garbled
const response = await fetch('/<apipath>', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: data.data
});
const res = await response;
}
});
So I figured it out using XSLX library on the client side (no longer using Papaparse), in case it helps someone.
const reader = new FileReader();
reader.onload = async (evt) => {
const bstr = evt.target.result;
const wb = XLSX.read(bstr, {type:'binary'});
const wsname = wb.SheetNames[0];
const ws = wb.Sheets[wsname];
//data is parsed correctly
const data = XLSX.utils.sheet_to_json(ws, {header:1});
const response = await fetch('/<apipath>', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(data)
});
const res = await response;
};
reader.readAsBinaryString(e.target.files[0]);