I have a webapp served with apache2 running python-flask in the backend. The app is hosted on Linode and heavily relies on their S3 Object Storage. I'm using boto3
to interact with the S3 storage. My issue is regarding the generate_presigned_url
method when used in production. It returns the following structure:
{
'url': 'https://eu-central-1.linodeobjects.com/my-s3-bucket',
'fields': {
'ACL': 'private',
'key': 'foo.bar',
'AWSAccessKeyId': 'FOOBAR',
'policy': 'base64longhash...',
'signature': 'foobar'
}
}
Everytime I use this method on the same python session the policy
key returns a longer value (about 1.5x increase in length for every subsequent request). After a few requests the size of the policy
gets really large (tens of MB) and the app breaks. If I restart the python service the policy
size gets reset.
After digging in the boto3
documentation and some threads in GitHub and here I couldn't find anything that helped me in regards to resetting the S3 connection without having to restart the whole python session. To keep restarting the apache2 service periodically is not a good approach, so my solution was to call the generate_presigned_url
from a standalone script using subprocess
and parse the string output back to json before using it, which is not ideal, as I wish I didn't have to keep calling bash scripts from inside apache. The main functions I use follow bellow:
AWS_BUCKET_PARAMS = {'ACL': 'private'}
# connect to my linode's s3 bucket
def awsSign():
return boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, endpoint_url=AWS_ENDPOINT_URL)
# generate presigned post object for uploading files
def awsPostForm(file_path):
s3 = awsSign()
return s3.generate_presigned_post(AWS_BUCKET, file_path, AWS_BUCKET_PARAMS, [AWS_BUCKET_PARAMS], 1800)
# generate post object from external script
def awsPostFormTerminal(file_path):
from subprocess import Popen, PIPE
cmd = [ 'python3', '-c', f'from utils import awsPostForm; print(awsPostForm("{file_path}"))' ]
output = Popen( cmd, stdout=PIPE ).communicate()[0]
return json.loads(output.decode('utf-8').replace('\n', '').replace("'", '"'))
The problem happens regardless of calling awsSign()
one or many times for a list of files.
In short, I wish for a better way of retrieving subsequent post
forms from generate_presigned_url
in the same python session, without increasing the policy
on every new request. If there is a proper way to restart the boto3
connection, provide some parameters that I missed when setting the API calls or maybe it's something particular to the Linode's S3 object storage service.
If anyone can point me at the right direction I'll appreciate!
Well, turns out it was a rookie mistake - got the hint from the linode's Q&A. So, answering my own question:
turns out the AWS_BUCKET_PARAMS
variable was being updated by reference after passing through generate_presigned_post
. Copying the global variable inside the function's scope before sending the request solved the issue.