Search code examples
pythonamazon-s3boto3amazon-emr

PORTING usage of s3curl.pl to Python


I've been running a cmd command:

~/s3curl/s3curl.pl --id mapreduce -- -sf https://$SERVER/$PATH >> $TEMP_FILE

And I want to port my script to Python.

I tried:

import boto3
client = boto3.client('s3')
response = client.get_object(Bucket=<server>, Key=<path>)

But I'm getting an error:

botocore.exceptions.ClientError: An error occurred (AllAccessDisabled) when calling the GetObject operation: All access to this object has been disabled

What am I doing wrong?

Thanks!


Solution

  • So it turns out there was a file named .s3curl located in the same directory with s3curl.pl that included a user id and encryption key.

    I translated it to a yaml file named s3.yaml that contains:

    awsSecretAccessKeys:
      mapreduce:
        id: <insert id here>
        key: <insert key here>
    

    And the Pythonic solution is:

    def download_file_from_s3(s3_server, path, export_path):
        url = s3_server + path
        with open('s3.yaml') as f:
            s3_conf = yaml.load(f.read())['awsSecretAccessKeys']['mapreduce']
    
        now = datetime.now().strftime('%a, %d %b %Y %H:%M:%S +0000')
        to_sign = 'GET\n\n\n{}\n{}'.format(now, path)
        signature = hmac.new(s3_conf['key'], to_sign, sha1).digest().encode("base64").rstrip('\n')
        response = requests.get(url, headers={'Date': now, 'Authorization': 'AWS {}:{}'.format(s3_conf['id'], signature)})
    
        response.raise_for_status()
    
        with open(export_path, 'ab') as f:
            for block in response.iter_content(4096):
                f.write(block)