I have an api gateway with a lambda and a s3 bucket. All I am trying to do is upload a PDF file via the API POST request form-data.
When I upload a super simple PDF, for example: https://www.africau.edu/images/default/sample.pdf, it works fine. All the content is reflected correctly. However, if the PDF is anything more complex, I get an empty PDF file with the same number of pages. No content. I am not sure what is causing this..
This is how I am receiving the API request. I compared the raw content via textcompare and the raw content is identical:
import { parse } from "parse-multipart-data"
const contentType = event.headers["Content-Type"] || event.headers["content-type"]
const boundary = contentType?.split("boundary=")[1]
const parts = parse(Buffer.from(eventBody), boundary)
let result = {
file: {},
} as FileRequestDecoded
for (let i = 0; i < parts.length; i++) {
const part = parts[i]
if (part.type === "application/pdf") {
// Get only one file to upload
console.log("pdf found ", part.filename)
result.file = {
id: `${fileId}-${part.filename!}`.toLowerCase(),
name: part.filename!,
data: part.data, // This is the file data
}
}
}
This is how I am uploading to S3 (Probably the part I am screwing up)
await this.s3.putBufferData(
this.bucketName,
`pdfs/${fileData.file.id}`,
fileData.file.data
)
async putBufferData(bucketName: string, key: string, data: Buffer) {
const command = new PutObjectCommand({
Bucket: bucketName,
Key: key,
Body: data,
ContentType: 'application/pdf',
})
await this.s3Client.send(command)
}
There is another approach you can take that will work better with complex PDF documents (application/pdf).
Passing Binary Data via API Gateway can be replaced by using a Presigned URL. That is, your request through API Gateway can invoke an S3 operation to generate a Presigned URL which is returned to the client.
Then the client can use that presigned URL to upload the PDF to the S3 bucket. There is a very similiar example app here that shows this app logic.
The difference in this app is images as opposed to PDF docs are uploaded to the S3 bucket.
Create a photo asset management application that lets users manage photos using labels
More background info about this sample serverless app can be found here:
Cloud Journeys: Building a Serverless Image Recognition Website with Machine Learning