My system is defined as S3 -> Lambda -> SQS -> EC2. When a file is uploaded to S3, it triggers S3 notification to Lambda. Lambda captures S3 bucket name and object key by
s3_bucket = event['Records'][0]['s3']['bucket']['name']
s3_key = event['Records'][0]['s3']['object']['key']
The message is converted to JSON and sent to SQS. The conversion is done by
json.dumps({'from_s3': 's3://{b}/{k}'.format(b=s3_bucket, k=s3_key)})
Then EC2 polls the SQS by boto3
response = sqs_client.receive_message(QueueUrl=queue_url, AttributeNames=['ALL'], MaxNumberOfMessages=5)
messages = response['Messages']
body = json.loads(messages[i]['Body']
from_s3 = body['from_s3']
s3_bucket, s3_key = re.match(r"s3:\/\/(.+?)\/(.+)", from_s3).groups()
According to the log, if an uploaded file has spaces, e.g. "abc def.jpg". The received value of s3_key will get "abc+def.jpg". As a result, when I download the file by the value via download_file of boto3 s3 client, it returns 404 error.
How should I encode the S3 object key in Lambda so that boto3 s3 client can download?
To obtain the unquoted key, you can use:
objectKey = urllib.parse.unquote_plus(event['Records'][0]['object']['key']))
Also, please note that there might be multiple events provided to your AWS Lambda function. It should look through the events like this:
for record in events['Records']:
s3_bucket = record['s3']['bucket']['name']
s3_key = urllib.parse.unquote_plus(record['s3']['object']['key']))
# Do stuff here