Search code examples
node.jsexcelamazon-s3aws-sdkxlsx

S3 Upload with Excel data using Node.js


I am trying to upload an excel file to S3 using Node.js and aws-sdk

The input is JSON and I am using XLSX library to convert it to a workbook and upload to S3 using the below code.

    const wb = XLSX.utils.book_new();

    //Convert JSON to sheet data
    const sheetData = XLSX.utils.json_to_sheet(req.body);
    
    XLSX.utils.book_append_sheet(wb, sheetData, 'Sheet 1');
    const sheetDataBuffer = XLSX.write(wb, {bookType: 'xlsx', type: 'buffer', bookSST: false});
    const s3 = new AWS.S3();
    s3.upload({
        Key: file,
        Bucket: <bucket name>,
        Body: sheetDataBuffer,
        ContentType: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
        ContentEncoding: 'base64'
    }).promise()
        .then(data => {
            logger.debug("uploaded the data");
            res.sendStatus(200);
    });
}

However, when I view the uploaded file on S3, it shows garbled/corrupt data. What am I missing? Is this a content encoding issue?

Update: It looks like the client side parsing library I am using -- Papaparse -- parses the excel content as garbled. I tried setting the encoding option to 'utf-8' but it doesn't help. Any idea what am I missing?

     Papa.parse(e.target.files[0], {
        encoding: 'utf-8',
        download: true,
        complete: async data => {
            
            // data.data is garbled
      
            const response = await fetch('/<apipath>', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: data.data
            });
            const res = await response;
        }
    });

Solution

  • So I figured it out using XSLX library on the client side (no longer using Papaparse), in case it helps someone.

        const reader = new FileReader();
        reader.onload = async (evt) => {
            const bstr = evt.target.result;
            const wb = XLSX.read(bstr, {type:'binary'});
            const wsname = wb.SheetNames[0];
            const ws = wb.Sheets[wsname];
    
            //data is parsed correctly
            const data = XLSX.utils.sheet_to_json(ws, {header:1});
            const response = await fetch('/<apipath>', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify(data)
            });
            const res = await response;
        };
        reader.readAsBinaryString(e.target.files[0]);