Search code examples
pythonamazon-web-servicesamazon-s3boto3linode

S3 presigned post urls issue using boto3


I have a webapp served with apache2 running python-flask in the backend. The app is hosted on Linode and heavily relies on their S3 Object Storage. I'm using boto3 to interact with the S3 storage. My issue is regarding the generate_presigned_url method when used in production. It returns the following structure:

{
 'url': 'https://eu-central-1.linodeobjects.com/my-s3-bucket', 
 'fields': {
   'ACL': 'private', 
   'key': 'foo.bar', 
   'AWSAccessKeyId': 'FOOBAR', 
   'policy': 'base64longhash...', 
   'signature': 'foobar'
  }
} 

Everytime I use this method on the same python session the policy key returns a longer value (about 1.5x increase in length for every subsequent request). After a few requests the size of the policy gets really large (tens of MB) and the app breaks. If I restart the python service the policy size gets reset.

After digging in the boto3 documentation and some threads in GitHub and here I couldn't find anything that helped me in regards to resetting the S3 connection without having to restart the whole python session. To keep restarting the apache2 service periodically is not a good approach, so my solution was to call the generate_presigned_url from a standalone script using subprocess and parse the string output back to json before using it, which is not ideal, as I wish I didn't have to keep calling bash scripts from inside apache. The main functions I use follow bellow:

AWS_BUCKET_PARAMS = {'ACL': 'private'}

# connect to my linode's s3 bucket
def awsSign():
    return boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, endpoint_url=AWS_ENDPOINT_URL)

# generate presigned post object for uploading files
def awsPostForm(file_path):
    s3 = awsSign()
    return s3.generate_presigned_post(AWS_BUCKET, file_path, AWS_BUCKET_PARAMS, [AWS_BUCKET_PARAMS], 1800)

# generate post object from external script
def awsPostFormTerminal(file_path):
    from subprocess import Popen, PIPE
    cmd = [ 'python3', '-c', f'from utils import awsPostForm; print(awsPostForm("{file_path}"))' ]
    output = Popen( cmd, stdout=PIPE ).communicate()[0]
    return json.loads(output.decode('utf-8').replace('\n', '').replace("'", '"'))

The problem happens regardless of calling awsSign() one or many times for a list of files.

In short, I wish for a better way of retrieving subsequent post forms from generate_presigned_url in the same python session, without increasing the policy on every new request. If there is a proper way to restart the boto3 connection, provide some parameters that I missed when setting the API calls or maybe it's something particular to the Linode's S3 object storage service.

If anyone can point me at the right direction I'll appreciate!


Solution

  • Well, turns out it was a rookie mistake - got the hint from the linode's Q&A. So, answering my own question:

    turns out the AWS_BUCKET_PARAMS variable was being updated by reference after passing through generate_presigned_post. Copying the global variable inside the function's scope before sending the request solved the issue.