node.js stream aws-sdk-nodejs aws-sdk-js-v3

How to upload a stream to S3 with AWS SDK v3

I have to transfer a file from and API endpoint to two different bucket. The original upload is made using:

curl -X PUT -F "data=@sample" "http://localhost:3000/upload/1/1"

The endpoint where the file is uploaded:

const PassThrough = require('stream').PassThrough;

async function uploadFile (req, res) {
  try {
    const firstS3Stream = new PassThrough();
    const secondS3Stream = new PassThrough();
    req.pipe(firstS3Stream);
    req.pipe(secondS3Stream);

    await Promise.all([
      uploadToFirstS3(firstS3Stream),
      uploadToSecondS3(secondS3Stream),
    ]);
    return res.end();
  } catch (err) {
    console.log(err)
    return res.status(500).send({ error: 'Unexpected error during file upload' });
  }
}

As you can see, I use two PassThrough streams, in order to duplicate the request stream into two readable streams, as suggested in this SO thread.

This piece of code remains unchanged, what is interesting here are the uploadToFirstS3 and uploadToSecondS3 functions. In this minimal example both do exactly the same thing with a different configuration, i will expend only one here.

What Works Well:

const aws = require('aws-sdk');

const s3 = new aws.S3({
  accessKeyId: S3_API_KEY,
  secretAccessKey: S3_API_SECRET,
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key: 'some-key',
    Body: stream,
  };
  s3.upload(uploadParams, (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

This piece of code (based on the aws-sdk package) works fine. My issue here is that i want it to run with the @aws-sdk/client-s3 package in order to reduce the size of the project.

What doesn't work:

I first tried to use S3Client.send(PutObjectCommand):

const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');

const s3 = new S3Client({
  credentials: {
    accessKeyId: S3_API_KEY,
    secretAccessKey: S3_API_SECRET,
  },
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key:'some-key',
    Body: stream,
  };
  s3.send(new PutObjectCommand(uploadParams), (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

Then I tried S3.putObject(PutObjectCommandInput):

const { S3 } = require('@aws-sdk/client-s3');

const s3 = new S3({
  credentials: {
    accessKeyId: S3_API_KEY,
    secretAccessKey: S3_API_SECRET,
  },
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key:'some-key',
    Body: stream,
  };
  s3.putObject(uploadParams, (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

The two last examples both give me a 501 - Not Implemented error with the header Transfer-Encoding. I checked req.headers and there is no Transfer-Encoding in it, so I guess the sdk adds in the request to s3 ?

Since the first example (based on aws-sdk) works fine, I'm sure the error is not due to an empty body in the request as suggested in this SO thread.

Still, I thought maybe the stream wasn't readable yet when triggering the upload, thus I wrapped the calls to uploadToFirstS3 and uploadToSecondS3 with a callback triggered by the req.on('readable', callback) event, but nothing changed.

I would like to process the files in memory without storing it on the disk at any time. Is there a way to achieve it using the @aws-sdk/client-s3 package ?

Solution

In v3 you can use the Upload class from @aws-sdk/lib-storage to do multipart uploads. Seems like there might be no mention of this in the docs site for @aws-sdk/client-s3 unfortunately.

It's mentioned in the upgrade guide here: https://github.com/aws/aws-sdk-js-v3/blob/main/UPGRADING.md#s3-multipart-upload

Here's a corrected version of the example provided in https://github.com/aws/aws-sdk-js-v3/tree/main/lib/lib-storage:

  import { Upload } from "@aws-sdk/lib-storage";
  import { S3Client } from "@aws-sdk/client-s3";

  const target = { Bucket, Key, Body };
  try {
    const parallelUploads3 = new Upload({
      client: new S3Client({}),
      tags: [...], // optional tags
      queueSize: 4, // optional concurrency configuration
      leavePartsOnError: false, // optional manually handle dropped parts
      params: target,
    });

    parallelUploads3.on("httpUploadProgress", (progress) => {
      console.log(progress);
    });

    await parallelUploads3.done();
  } catch (e) {
    console.log(e);
  }

At the time of writing, the following Body types are supported:

string
Uint8Array
Buffer
Blob (hence also File)
Node Readable
ReadableStream

(according to https://github.com/aws/aws-sdk-js-v3/blob/main/lib/lib-storage/src/chunker.ts)

However if the Body object comes from a polyfill or separate realm and thus isn't strictly an instanceof one of these values, you will get an error. You can work around a case like this by cloning the Uint8Array/Buffer or piping the stream through a PassThrough. For example if you are using archiver to upload a .zip or .tar archive, you can't pass the archiver stream directly because it's a userland Readable implementation (at time of writing), so you must do Body: archive.pipe(new PassThrough()).