Has anyone got "S3 Select" (https://aws.amazon.com/blogs/aws/s3-glacier-select/ , https://aws.amazon.com/about-aws/whats-new/2018/04/amazon-s3-select-is-now-generally-available/) with boto3 (or even cli or another sdk) working? I am getting cryptic InternalError below:
Running this on EC2 that has an IAM role:
[ec2-user@ip-blah bin]$ ./python
Python 2.7.13 (default, Jan 31 2018, 00:17:36)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto3
>>> s3 = boto3.client('s3')
>>> r = s3.select_object_content(
... Bucket='mybucketname',
... Key='mypath/file.txt',
... ExpressionType='SQL',
... Expression="select count(*) from s3object s",
... InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}},
... OutputSerialization = {'CSV': {}},
... )
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
File "/home/ec2-user/venv/local/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/ec2-user/venv/local/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InternalError) when calling the SelectObjectContent operation (reached max retries: 4): We encountered an internal error. Please try again.
My guesses:
check structure on csv file (number of headers matches data columns, escaping spec. chars, whitespaces, /n as new line def.,.)
try ... Expression="SELECT * FROM S3Object s", InputSerialization={'CSV': {}}, OutputSerialization={'CSV': {}}, ...
hope that helps a little!