Search code examples
amazon-web-servicesgoamazon-s3aws-sdk-go

Upload string (or array of bytes) in parts to AWS S3 in Golang


How can one upload a large string or an array of bytes to an AWS S3 bucket in parts using Golang?

For instance, if I have a string with billions of characters and I want to avoid uploading it all at once due to memory constraints, how can I achieve this?

My use case involves exporting specific data from a database, encoding that data as JSON objects (stored as strings), and sequentially uploading these parts to an AWS S3 bucket until the complete JSON file is generated and uploaded. The intention is for the final file uploaded to S3 to be downloadable in JSON format.

I encountered this issue a few days ago and am sharing the question along with its answer here to assist other users who might face the same or a similar challenge. If you believe there are any further improvements needed for clarity, please let me know!


Solution

  • The process of uploading large data to an AWS S3 bucket in parts can be achieved using the s3manager package. The key is to utilize the Body field of the s3manager.UploadInput struct, which accepts an io.Reader.

    You can use io.Pipe which provides you a reader (which you’ll attach to the Body field of UploadInput) and a writer (which you’ll write each chunk to). As you write each chunk, it will be uploaded to your S3 bucket. Remember to handle any errors that might arise during the upload process. Also, don’t forget to close the writer so S3 can finish the upload process.

    Here’s an example working code you can start with:

    package main
    
    import (
        "fmt"
        "io"
    
        "github.com/aws/aws-sdk-go/aws"
        "github.com/aws/aws-sdk-go/aws/credentials"
        "github.com/aws/aws-sdk-go/aws/session"
        "github.com/aws/aws-sdk-go/service/s3/s3manager"
    )
    
    const (
        // Use environment variables instead for credentials
        AccessKey = "your-access-key"
        SecretKey = "your-secret-key"
    
        // Your bucket region and name
        S3Region  = "your-bucket-region" // e.g. eu-central-1
        S3Bucket  = "your-bucket-name"
    
        // The number of bytes per chunk. Change this according to your case, this is just
        // an example value used in this code because here we are creating chunks from a string.
        // You can use something like 10 * 1024 * 1024 to set up chunk size to 10MB.
        ChunkSize = 50
    )
    
    func main() {
        // create an aws session
        sess, err := session.NewSession(&aws.Config{
            Region:      aws.String(S3Region),
            Credentials: credentials.NewStaticCredentials(AccessKey, SecretKey, ""),
        })
        if err != nil {
            panic(err)
        }
    
        // create an uploader
        pr, pw := io.Pipe()
        errch := make(chan error)
        go chunkUploader(sess, "example_file_1", errch, pr)
    
        // retrieve the data from database
        chunk, skip := retrieveNextChunk(0, ChunkSize)
        for {
            if len(chunk) == 0 {
                break
            }
    
            // this uploads the chunk
            pw.Write(chunk)
    
            // this retrieves new data from "database" and saves the as a new chunk and new
            // skip value for the next retrieving
            chunk, skip = retrieveNextChunk(skip, ChunkSize)
        }
    
        // close the writter - this tells S3 to finish uploading your file which will
        // then appear in your bucket object list page
        pw.Close()
    
        // check for errors
        err = <-errch
        if err != nil {
            panic(err)
        }
        fmt.Println("Data successfully uploaded")
    }
    
    // this is an example function for retrieving a part of data from your database
    func retrieveNextChunk(skip int, limit int) ([]byte, int) {
        fulldata := "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
    
        var chunk string
        if skip+limit > len(fulldata) {
            chunk = fulldata[skip:]
        } else {
            chunk = fulldata[skip : skip+limit]
        }
        return []byte(chunk), skip + len(chunk)
    }
    
    func chunkUploader(session *session.Session, key string, errch chan<- error, reader *io.PipeReader) {
        _, err := s3manager.NewUploader(session).Upload(&s3manager.UploadInput{
            Bucket:             aws.String(S3Bucket),
            Key:                aws.String(key),
            Body:               reader,
            ContentDisposition: aws.String("attachment"), // or "inline" = the file will be displayed in the browser if possible
            ContentType:        aws.String("text/plain"), // change this to you content type, for example application/json
        })
        errch <- err
    }
    
    

    Feel free to adapt this code to your specific scenario. This approach allows you to upload large data to S3 in manageable chunks, avoiding memory issues and ensuring a smoother upload process.