Search code examples
node.jsamazon-web-servicespdfamazon-s3aws-lambda

AWS S3 PDF file upload via PutObjectCommand results in empty pdf


I have an api gateway with a lambda and a s3 bucket. All I am trying to do is upload a PDF file via the API POST request form-data.

When I upload a super simple PDF, for example: https://www.africau.edu/images/default/sample.pdf, it works fine. All the content is reflected correctly. However, if the PDF is anything more complex, I get an empty PDF file with the same number of pages. No content. I am not sure what is causing this..

This is how I am receiving the API request. I compared the raw content via textcompare and the raw content is identical:


import { parse } from "parse-multipart-data"

const contentType = event.headers["Content-Type"] || event.headers["content-type"]
const boundary = contentType?.split("boundary=")[1]
const parts = parse(Buffer.from(eventBody), boundary)
let result = {
   file: {},
 } as FileRequestDecoded

      for (let i = 0; i < parts.length; i++) {
        const part = parts[i]
        if (part.type === "application/pdf") {
          // Get only one file to upload
          console.log("pdf found ", part.filename)
          
          result.file = {
            id: `${fileId}-${part.filename!}`.toLowerCase(),
            name: part.filename!,
            data: part.data, // This is the file data
          }
        }
      }

This is how I am uploading to S3 (Probably the part I am screwing up)

      await this.s3.putBufferData(
        this.bucketName,
        `pdfs/${fileData.file.id}`,
        fileData.file.data
      )

  async putBufferData(bucketName: string, key: string, data: Buffer) {
    const command = new PutObjectCommand({
      Bucket: bucketName,
      Key: key,
      Body: data,
      ContentType: 'application/pdf',
    })

    await this.s3Client.send(command)
  }



Solution

  • There is another approach you can take that will work better with complex PDF documents (application/pdf).

    Passing Binary Data via API Gateway can be replaced by using a Presigned URL. That is, your request through API Gateway can invoke an S3 operation to generate a Presigned URL which is returned to the client.

    Then the client can use that presigned URL to upload the PDF to the S3 bucket. There is a very similiar example app here that shows this app logic.

    The difference in this app is images as opposed to PDF docs are uploaded to the S3 bucket.

    Create a photo asset management application that lets users manage photos using labels

    More background info about this sample serverless app can be found here:

    Cloud Journeys: Building a Serverless Image Recognition Website with Machine Learning