Search code examples
goamazon-s3downloadbufferaws-sdk-go

Why does getObject take longer when setting the file size dynamically and storing in memory in aws-sdk-go?


Based on my testing, when downloading 6 MB file from s3 from a fargate task it takes around 1.3 seconds. However, a 1.7 MB files takes 150 ms to finish. Assuming linear growth, 6MB should have taken at most around 600 ms. But this was not the case.

buff := aws.NewWriteAtBuffer([]byte{}))
getObjectInput := &s3.GetObjectInput{
    Bucket: aws.String(bucket),
    Key:    aws.String(uri),
}
, err = u.downloader.DownloadWithContext(ctx, buff, getObjectInput)

I spent a few hours trying to debug my service, and found that this set of code is not eats a lot of memory based on this. https://github.com/aws/aws-sdk-go/issues/3085

To correct this and use less memory, I changed my code to this.

getHeaderInput := &s3.HeadObjectInput{
    Bucket: aws.String(bucket),
    Key:    aws.String(uri),
}
res, _:= u.s3.HeadObject(getHeaderInput)
buff := aws.NewWriteAtBuffer(make([]byte, 0, *res.ContentLength))
getObjectInput := &s3.GetObjectInput{
    Bucket: aws.String(bucket),
    Key:    aws.String(uri),
}
, err = u.downloader.DownloadWithContext(ctx, buff, getObjectInput)

What I found is that the download time also decreased, and the overall duration of code is also shorter when downloading the 6MB file which is around 600ms as what it should be based on linear growth.


Solution

  • Buffer slice []byte{} or, equivalently, make([]byte, 0, 0) appends in amortized time.

    Buffer slice make([]byte, 0, *res.ContentLength) appends in linear time.


    BenchmarkAmortized-12  100  10361800 ns/op  33249829 B/op  31 allocs/op
    BenchmarkLinear-12     710   1612166 ns/op   6291461 B/op   1 allocs/op
    

    package main
    
    import "testing"
    
    func BenchmarkAmortized(b *testing.B) {
        var src = make([]byte, 6*1024*1024)
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            var dst = make([]byte, 0, 0)
            for len(dst) < len(src) {
                dst = append(dst, src[len(dst):len(dst)+1024]...)
            }
        }
    }
    
    func BenchmarkLinear(b *testing.B) {
        var src = make([]byte, 6*1024*1024)
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            var dst = make([]byte, 0, len(src))
            for len(dst) < len(src) {
                dst = append(dst, src[len(dst):len(dst)+1024]...)
            }
        }
    }
    

    Appending to slices

    The variadic function append appends zero or more values x to a slice s and returns the resulting slice of the same type as s. The core type of s must be a slice of type []E. The values x are passed to a parameter of type ...E and the respective parameter passing rules apply. As a special case, if the core type of s is []byte, append also accepts a second argument with core type bytestring followed by .... This form appends the bytes of the byte slice or string.

    append(s S, x ...E) S  // core type of S is []E
    

    If the capacity of s is not large enough to fit the additional values, append allocates a new, sufficiently large underlying array that fits both the existing slice elements and the additional values. Otherwise, append re-uses the underlying array.


    The Go Blog: Arrays, slices (and strings): The mechanics of 'append'


    Amortized time complexity

    Amortized complexity analysis is most commonly used with data structures that have state that persists between operations. The basic idea is that an expensive operation can alter the state so that the worst case cannot occur again for a long time, thus amortizing its cost.