Search code examples
amazon-web-servicesgoamazon-s3sdk

AWS S3 SelectObjectContent not returning results in AWS SDK v2 for Go


I'm having difficulty getting SelectObjectContent to return any details. The frustrating part is that the query works in the console.

My test object is a simple JSON file stored in a test bucket:

{
  "Name": "Kevin",
  "Role": "engineer",
  "Color": "blue"
}

and my query is equally simple:

SELECT * FROM s3object LIMIT 5

Below is the function I've assembled for a specific use case:

func S3SelectObjectContent(bucket, region, objectKey, expression string) (*s3.SelectObjectContentOutput, error) {
    client := s3.NewFromConfig(Config, func(o *s3.Options) {
        o.Region = region
    })

    input := &s3.SelectObjectContentInput{
        Bucket:         &bucket,
        Key:            &objectKey,
        Expression:     &expression,
        ExpressionType: types.ExpressionTypeSql,
        InputSerialization: &types.InputSerialization{
            JSON: &types.JSONInput{
                Type: types.JSONTypeDocument,
                // Type: types.JSONTypeLines
            },
            CompressionType: types.CompressionTypeGzip,
        },
        OutputSerialization: &types.OutputSerialization{
            JSON: &types.JSONOutput{
                RecordDelimiter: aws.String("\n"),
            },
        },
    }

    result, err := client.SelectObjectContent(context.TODO(), input)
    if err != nil {
        return nil, err
    }

    return result, nil
}

Thanks in advance for your help and ideas.

I've tried adjusting the bucket and objectKey ... I get a "not found" if these are incorrect.

I've confirmed I have access to the bucket as I'm able to print the listing of objects using ListObjectsV2

I've confirmed the code is reading something, because if I point to a non-GZIP'd version of the file, it complains the file is not compressed.

My sandbox bucket is private and has a minimal bucket policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::kkevin-testbucket",
                "arn:aws:s3:::kkevin-testbucket/*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalAccount": "123456789064"
                }
            }
        }
    ]
}

EDIT: I can confirm that this CLI command also works and generates correct output:

aws s3api select-object-content \
    --bucket kkevin-testbucket \
    --key testfile.json.gz \
    --expression "select * from s3object limit 5" \
    --expression-type 'SQL' \
    --input-serialization '{"JSON": {"Type": "Document"}, "CompressionType": "GZIP"}' \
    --output-serialization '{"JSON": {}}' "output.json" \
    --profile sandbox

Solution

  • Your code is correct. I added some print statements to your S3SelectObjectContent function as follows

    stream := result.GetStream()
    defer stream.Close()
    
    for event := range stream.Events() {
        v, ok := event.(*types.SelectObjectContentEventStreamMemberRecords)
        if ok {
            value := string(v.Value.Payload)
            fmt.Print(value)
        }
    }
    
    if err := stream.Err(); err != nil {
        return nil, err
    }
    

    and I got the following output

    {"Name":"Kevin","Role":"engineer","Color":"blue"}