Search code examples
amazon-web-servicesamazon-s3botoboto3amazon-s3-select

S3 Select with boto3 - internalerror


Has anyone got "S3 Select" (https://aws.amazon.com/blogs/aws/s3-glacier-select/ , https://aws.amazon.com/about-aws/whats-new/2018/04/amazon-s3-select-is-now-generally-available/) with boto3 (or even cli or another sdk) working? I am getting cryptic InternalError below:

Running this on EC2 that has an IAM role:

[ec2-user@ip-blah bin]$ ./python
Python 2.7.13 (default, Jan 31 2018, 00:17:36)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto3
>>> s3 = boto3.client('s3')
>>> r = s3.select_object_content(
...         Bucket='mybucketname',
...         Key='mypath/file.txt',
...         ExpressionType='SQL',
...         Expression="select count(*) from s3object s",
...         InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}},
...         OutputSerialization = {'CSV': {}},
... )
Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
  File "/home/ec2-user/venv/local/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ec2-user/venv/local/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InternalError) when calling the SelectObjectContent operation (reached max retries: 4): We encountered an internal error. Please try again.

Solution

  • My guesses:

    • check permissions on s3
    • adapt 'RecordDelimiter','FieldDelimiter', 'QuoteCharacter' if neccessary on InputSerialization
    • check structure on csv file (number of headers matches data columns, escaping spec. chars, whitespaces, /n as new line def.,.)

    • try ... Expression="SELECT * FROM S3Object s", InputSerialization={'CSV': {}}, OutputSerialization={'CSV': {}}, ...

    hope that helps a little!