I have an api gateway with path and query string params setup and lambda integration using a lambda with a pandas layer (awswrangler). I want to be able to have the user specify the bucket as the path param and a query string param which will dictate whether data is returned to them either as json or csv (downloadable) Basically, I want it to download the csv automatically if they specify fmt=csv in the url. Here is the gateway with the two formats:
https://m0fhdyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=json
https://m0fhddyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=csv
br-candles is the bucket
Here is my lambda code. Currently, I only have "statuscode: 200 for the csv format since I dont know how to do what I want.
import json
import awswrangler as wr
def lambda_handler(event, context):
print(event)
bucket_name = event['params']['path']['bucket']
format = event['params']['querystring']['fmt']
full_path = f"s3://{bucket_name}"
print(bucket_name, format)
raw_df = wr.s3.read_csv(path=full_path, path_suffix=['.csv'], use_threads=True)
for df in raw_df:
if format == 'json':
df = raw_df.to_json(orient="records")
parsed = json.loads(df)
return {
'body': (parsed)
}
elif format == 'csv':
for df in raw_df:
#df = df.to_string(index=False)
#print (df)
return {
"statusCode": 200
}
else:
return {
"statusCode": 300
}
Specify the bucket as a path param
To specify the bucket as a path param, the simplest is to define a path variable of the form /base-url-of-your-api/{bucket}
in your API Gateway route. Up to your lambda to check whether the given parameter is correct and to return an HTTP error of not.
You will end up with something like this:
{bucket}
bucket_name = event['requestContext']['path'].split('/')[-1]
I'd rather use a functional name instead of a physical bucket name though. This not to disclose to the end user the name of your S3 buckets and to keep the flexibility to organize your buckets differently in the future without changing the API contract:
/your-api-base-url/{functionalName}
''' with some proper error handling to be added '''
buckets_mapping = {
'functionalNameAlpha' : 'bucket-name-for-alpha',
'functionalNameBeta' : 'bucket-name-for-beta',
'functionalNameGamma' : 'bucket-name-for-gamma'
}
functional_name = event['requestContext']['path'].split('/')[-1]
bucket = buckets_mapping[functional_name ]
Return format
Regarding the return format, the cleaner way is to rely on the standard Accept
HTTP header. This header is set by the caller and defines a list of accepted formats. For example: Accept: application/json,application/xml,text/csv
means that these json, xml and csv are the three format that the caller understands, in this order of preference.
The headers are available in the event
passed to your lambda_handler
mounted in proxy mode. Similarly as above, up to your lambda to check whether the accepted formats are compatible with your application, and to return a HTTP 406 "Not Acceptable" if not:
accepted_formats = event['headers']['Accept'] # for example: "application/xml,text/csv;q=0.9,application/json,text/*;q=0.2"
Sidenote
I maintain an open source libraray that is dedicated to make all this easier by providing ways to (among other things):
It is called awsmate
and it is available here: https://github.com/shlublu/awsmate
I'll try to find some time to edit this answer and post some code here should you be interested (let me know in comments). In the meantime, it comes with an example application that could show how to quickly do what you wish I hope.