java amazon-web-services amazon-s3 aws-sdk inputstream

Download file from URL and upload it to AWS S3 without saving into memory using AWS SDK for Java, version 2

I am writing a code that will download a file from URL and upload it to S3, but I don't want it to be stored temporarily in file or memory, I am downloading through 'InputStream' but AWS s3 requires the file size which I don't have from 'InputStream' is there any other way. I found the this discussion on same topic using 'Node.js'

My Code to Fetch the file in inputStream

HttpClient client = HttpClient.newBuilder().build();
URI uri = URI.create("{myUrl}");
HttpRequest request = HttpRequest.newBuilder().uri(uri).build();
InputStream is = client.send(request, HttpResponse.BodyHandlers.ofInputStream()).body();

Code I tried to insert into S3, but I don't have content_length

S3Client s3Client = S3Client.builder().build();
PutObjectRequest objectRequest = PutObjectRequest.builder()
                            .bucket(BUCKET_NAME)
                            .key(KEY)
                            .build();

PutObjectResponse por = s3Client.putObject(objectRequest, RequestBody.fromInputStream(is,content_length));

Solution

You have a few options.

The easiest is to retain the HttpResponse from your client.send(), and get the Content-Length header from it. You should also be looking for headers like Content-Type, and storing them as metadata on the S3 object.

That isn't guaranteed to work in all cases: some servers do not provide Content-Length. In that case you need to create a multipart upload to send the file. When doing this, you buffer relatively small chunks (minimum 5 MB) in memory but can upload up to 10,000 chunks. You must either complete or abort the upload, or configure your bucket to delete uncompleted uploads after a certain period of time; if not, you'll be charged for incomplete uploads.

A third alternative is to use the V1 SDK, which gives you TransferManager. That handles the multi-part upload for you, and uses multiple threads to improve bandwidth. But it still hasn't been implemented for V2.