I am using an AWS Lambda function where a user can specify a bucket key on an HTML page which corresponds to a CSV file in S3. The CSV file is read using boto3, tokenized, fed to a SageMaker endpoint, and a json string is returned by the Lambda function to the HTML page with AWS API Gateway. When I test all of this in AWS Lambda the JSON string returns fine, however, when I attempt to execute the Lambda function with the static webpage (hosted by S3) I get the following error messages in the browser console:
POST https://... 502
favicon.ico:1 GET https://... 403
When I check the AWS CloudWatch logs for my Lambda function I see:
\[ERROR\] NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 54, in lambda_handler
s3_response = s3_client.get_object(Bucket=BUCKET, Key=KEY)
File "/var/runtime/botocore/client.py", line 530, in \_api_call
return self.\_make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 960, in \_make_api_call
raise error_class(parsed_response, operation_name)
Which is confusing since I do not get this error when I test my lambda function using test events.
Here is my Lambda code:
import boto3
import json
import re
import pandas as pd # imported using AWS managed Lambda Layer AWSSDKPandas-Python310
REPLACE_NO_SPACE = re.compile("(.)|(;)|(:)|(!)|(')|(?)|(,)|(")|(()|())|(\[)|(\])")
REPLACE_WITH_SPACE = re.compile("(\<br\\s\*/\>\<br\\s\*/\>)|(-)|(/)")
def tokenize_words(comment):
words = REPLACE_NO_SPACE.sub("", comment.lower())
words = REPLACE_WITH_SPACE.sub(" ", words)
words = re.sub('\[^0-9a-zA-Z\]+', " ", words)
words = words.split()
words = " ".join(words)
return words
def output_df(output, scraped_df):
output_dict = {'Irrelevant': \[\],
'SomewhatRelevant': \[\],
'Relevant': \[\]}
for i in range(len(output)):
d = output[i]
p = d['prob']
output_dict['Irrelevant'].append(round(p[0], 2))
output_dict['SomewhatRelevant'].append(round(p[1], 2))
output_dict['Relevant'].append(round(p[2], 2))
prob_df = pd.DataFrame.from_dict(output_dict)
pred_df = pd.concat([scraped_df, prob_df], axis=1)
sorted_df = pred_df.sort_values(by = ['Irrelevant', 'SomewhatRelevant', 'Relevant'],
ascending=True,
ignore_index=True)
return sorted_df
def lambda_handler(event, context):
# S3 URI: s3://scraped-comments/2022/Cybersecurity.csv
CSV_FILE = str(event['body'])
BUCKET = 'scraped-comments'
KEY = f"2022/{CSV_FILE}.csv"
runtime = boto3.Session().client('sagemaker-runtime')
# S3 data
s3_client = boto3.client("s3")
s3_response = s3_client.get_object(Bucket=BUCKET, Key=KEY)
df = pd.read_csv(s3_response.get("Body"))
tokenized_comments = [tokenize_words(x) for x in df.Comment.values.tolist()]
num_comments = len(tokenized_comments)
payload = {"instances" : tokenized_comments, "configuration": {"k": num_comments}}
# Now we use the SageMaker runtime to invoke our endpoint, sending the text we were given
response = runtime.invoke_endpoint(EndpointName = endpoint_name,
ContentType = 'application/json',
Body = json.dumps(payload))
output = json.loads(response['Body'].read().decode('utf-8'))
bt_df = output_df(output, df)
json_string = bt_df.to_json(orient='records')
return {
'statusCode' : 200,
'headers' : { 'Content-Type' : 'text/plain', 'Access-Control-Allow-Origin' : '*' },
'body' : json_string
}
Additionally, I am a novice at coding in HTML and JavaScript so perhaps the problem lies in my HTML page which is here: `
Ranking Engine
<script>
function submitForm(oFormElement) {
var xhr = new XMLHttpRequest();
var jsonData = JSON.parse(xhr.responseText);
xhr.onload = function() {
let container = $("#container");
let table = $("<table>");
let cols = Object.keys(jsonData[0]);
let thead = $("<thead>");
let tr = $("<tr>");
$.each(cols, function(i, item){
let th = $("<th>");
th.text(item);
tr.append(th);
});
thead.append(tr); // Append the header row to the header
table.append(tr) // Append the header to the table
// Loop through the JSON data and create table rows
$.each(jsonData, function(i, item){
let tr = $("<tr>");
// Get the values of the current object in the JSON data
let vals = Object.values(item);
// Loop through the values and create table cells
$.each(vals, (i, elem) => {
let td = $("<td>");
td.text(elem); // Set the value as the text of the table cell
tr.append(td); // Append the table cell to the table row
});
table.append(tr); // Append the table row to the table
});
container.append(table) // Append the table to the container element
}
xhr.open (oFormElement.method, oFormElement.action, true);
var key = document.getElementById('key');
xhr.send(key.value);
return false;
}
</script>
</head>
<body>
<div>
<h1>Text Classification Engine</h1>
<p>Type a bucket key to see comments classifed as one of the following topics:</p>
<div style="float: left; width: 50%;">
<ul>
<li>Relevant</li>
<li>Irrelevant</li>
</ul>
</div>
<div style="float: right; width: 50%;">
<ul>
<li>Somewhat Relevant</li>
</ul>
</div>
<form method="POST"
action="https://..."
onsubmit="return submitForm(this);" >
<div>
<label for="key">Enter a key here (example provided):</label>
<textarea rows="5" id="key">Cybersecurity</textarea>
</div>
<button type="submit">Submit</button>
</form>
<div id="container"></div>
</div>
</body>
`
It may also be of note that my lambda function usually takes ~40 seconds to complete execution so perhaps there is a timeout issue, but I am unsure.
The API Gateway for feeding data to the SageMaker endpoint is a REST API with POST method execution, Integration Request Type Lambda, Method Request Auth: NONE, Method Response HTTP Status: 200, and Models: application/json => Empty.
The API Gateway for communication between my Lambda function and the front-end webapp is a REST API with ANY method execution, Method Request Auth: NONE, Integration Request Type Lambda Proxy, Method Response HTTP Status: Proxy, and Models: application/json => Empty.
I suspect this might be due to a permission error. My Lambda function has the AWSLambdaBasicExecutionRole, AmazonS3FullAccess, and AmazonSageMakerFullAccess roles. The S3 bucket that holds the CSV data has Block All Public Access enabled with the following bucket policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3Access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::xxxxxxxxxxxx:role/service-role/LAMBDA_ROLE"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::scraped-comments",
"arn:aws:s3:::scraped-comments/*"
]
}
]
}
I also do not have a CORS policy set if that is of any relevance.
I discovered that the reason why 'None' is being sent to my server is because var jsonData = JSON.parse(xhr.responseText); needs to be within the scope of my xhr.onload = function(). After small change, I was able to retrieve data from my S3 bucket. Thank you@AnonCoward for suggesting I print CSV_FILE and check the CloudWatch logs.