amazon-web-services go amazon-s3 aws-sdk-go

Upload string (or array of bytes) in parts to AWS S3 in Golang

How can one upload a large string or an array of bytes to an AWS S3 bucket in parts using Golang?

For instance, if I have a string with billions of characters and I want to avoid uploading it all at once due to memory constraints, how can I achieve this?

My use case involves exporting specific data from a database, encoding that data as JSON objects (stored as strings), and sequentially uploading these parts to an AWS S3 bucket until the complete JSON file is generated and uploaded. The intention is for the final file uploaded to S3 to be downloadable in JSON format.

I encountered this issue a few days ago and am sharing the question along with its answer here to assist other users who might face the same or a similar challenge. If you believe there are any further improvements needed for clarity, please let me know!

Solution

The process of uploading large data to an AWS S3 bucket in parts can be achieved using the s3manager package. The key is to utilize the Body field of the s3manager.UploadInput struct, which accepts an io.Reader.

You can use io.Pipe which provides you a reader (which you’ll attach to the Body field of UploadInput) and a writer (which you’ll write each chunk to). As you write each chunk, it will be uploaded to your S3 bucket. Remember to handle any errors that might arise during the upload process. Also, don’t forget to close the writer so S3 can finish the upload process.

Here’s an example working code you can start with:

package main

import (
    "fmt"
    "io"

    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/credentials"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/s3/s3manager"
)

const (
    // Use environment variables instead for credentials
    AccessKey = "your-access-key"
    SecretKey = "your-secret-key"

    // Your bucket region and name
    S3Region  = "your-bucket-region" // e.g. eu-central-1
    S3Bucket  = "your-bucket-name"

    // The number of bytes per chunk. Change this according to your case, this is just
    // an example value used in this code because here we are creating chunks from a string.
    // You can use something like 10 * 1024 * 1024 to set up chunk size to 10MB.
    ChunkSize = 50
)

func main() {
    // create an aws session
    sess, err := session.NewSession(&aws.Config{
        Region:      aws.String(S3Region),
        Credentials: credentials.NewStaticCredentials(AccessKey, SecretKey, ""),
    })
    if err != nil {
        panic(err)
    }

    // create an uploader
    pr, pw := io.Pipe()
    errch := make(chan error)
    go chunkUploader(sess, "example_file_1", errch, pr)

    // retrieve the data from database
    chunk, skip := retrieveNextChunk(0, ChunkSize)
    for {
        if len(chunk) == 0 {
            break
        }

        // this uploads the chunk
        pw.Write(chunk)

        // this retrieves new data from "database" and saves the as a new chunk and new
        // skip value for the next retrieving
        chunk, skip = retrieveNextChunk(skip, ChunkSize)
    }

    // close the writter - this tells S3 to finish uploading your file which will
    // then appear in your bucket object list page
    pw.Close()

    // check for errors
    err = <-errch
    if err != nil {
        panic(err)
    }
    fmt.Println("Data successfully uploaded")
}

// this is an example function for retrieving a part of data from your database
func retrieveNextChunk(skip int, limit int) ([]byte, int) {
    fulldata := "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

    var chunk string
    if skip+limit > len(fulldata) {
        chunk = fulldata[skip:]
    } else {
        chunk = fulldata[skip : skip+limit]
    }
    return []byte(chunk), skip + len(chunk)
}

func chunkUploader(session *session.Session, key string, errch chan<- error, reader *io.PipeReader) {
    _, err := s3manager.NewUploader(session).Upload(&s3manager.UploadInput{
        Bucket:             aws.String(S3Bucket),
        Key:                aws.String(key),
        Body:               reader,
        ContentDisposition: aws.String("attachment"), // or "inline" = the file will be displayed in the browser if possible
        ContentType:        aws.String("text/plain"), // change this to you content type, for example application/json
    })
    errch <- err
}

Feel free to adapt this code to your specific scenario. This approach allows you to upload large data to S3 in manageable chunks, avoiding memory issues and ensuring a smoother upload process.