How can one upload a large string or an array of bytes to an AWS S3 bucket in parts using Golang?
For instance, if I have a string with billions of characters and I want to avoid uploading it all at once due to memory constraints, how can I achieve this?
My use case involves exporting specific data from a database, encoding that data as JSON objects (stored as strings), and sequentially uploading these parts to an AWS S3 bucket until the complete JSON file is generated and uploaded. The intention is for the final file uploaded to S3 to be downloadable in JSON format.
I encountered this issue a few days ago and am sharing the question along with its answer here to assist other users who might face the same or a similar challenge. If you believe there are any further improvements needed for clarity, please let me know!
The process of uploading large data to an AWS S3 bucket in parts can be achieved using the s3manager
package. The key is to utilize the Body
field of the s3manager.UploadInput
struct, which accepts an io.Reader
.
You can use io.Pipe
which provides you a reader (which you’ll attach to the Body
field of UploadInput
) and a writer (which you’ll write each chunk to). As you write each chunk, it will be uploaded to your S3 bucket. Remember to handle any errors that might arise during the upload process. Also, don’t forget to close the writer so S3 can finish the upload process.
Here’s an example working code you can start with:
package main
import (
"fmt"
"io"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
const (
// Use environment variables instead for credentials
AccessKey = "your-access-key"
SecretKey = "your-secret-key"
// Your bucket region and name
S3Region = "your-bucket-region" // e.g. eu-central-1
S3Bucket = "your-bucket-name"
// The number of bytes per chunk. Change this according to your case, this is just
// an example value used in this code because here we are creating chunks from a string.
// You can use something like 10 * 1024 * 1024 to set up chunk size to 10MB.
ChunkSize = 50
)
func main() {
// create an aws session
sess, err := session.NewSession(&aws.Config{
Region: aws.String(S3Region),
Credentials: credentials.NewStaticCredentials(AccessKey, SecretKey, ""),
})
if err != nil {
panic(err)
}
// create an uploader
pr, pw := io.Pipe()
errch := make(chan error)
go chunkUploader(sess, "example_file_1", errch, pr)
// retrieve the data from database
chunk, skip := retrieveNextChunk(0, ChunkSize)
for {
if len(chunk) == 0 {
break
}
// this uploads the chunk
pw.Write(chunk)
// this retrieves new data from "database" and saves the as a new chunk and new
// skip value for the next retrieving
chunk, skip = retrieveNextChunk(skip, ChunkSize)
}
// close the writter - this tells S3 to finish uploading your file which will
// then appear in your bucket object list page
pw.Close()
// check for errors
err = <-errch
if err != nil {
panic(err)
}
fmt.Println("Data successfully uploaded")
}
// this is an example function for retrieving a part of data from your database
func retrieveNextChunk(skip int, limit int) ([]byte, int) {
fulldata := "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
var chunk string
if skip+limit > len(fulldata) {
chunk = fulldata[skip:]
} else {
chunk = fulldata[skip : skip+limit]
}
return []byte(chunk), skip + len(chunk)
}
func chunkUploader(session *session.Session, key string, errch chan<- error, reader *io.PipeReader) {
_, err := s3manager.NewUploader(session).Upload(&s3manager.UploadInput{
Bucket: aws.String(S3Bucket),
Key: aws.String(key),
Body: reader,
ContentDisposition: aws.String("attachment"), // or "inline" = the file will be displayed in the browser if possible
ContentType: aws.String("text/plain"), // change this to you content type, for example application/json
})
errch <- err
}
Feel free to adapt this code to your specific scenario. This approach allows you to upload large data to S3 in manageable chunks, avoiding memory issues and ensuring a smoother upload process.