I am doing a project for my school where I am supposed to do a document analysis on a form using textract and run that output to A2I where the algorithm will determine if the form is approved, rejected or review needed. This textract lambda function should be triggered once a document is uploaded to S3. I am however getting syntax errors when I follow this documentation; https://docs.aws.amazon.com/textract/latest/dg/API_StartDocumentAnalysis.html
My code is :
import urllib.parse
import boto3
print('Loading function')
##Clients
s3 = boto3.client('s3')
textract = boto3.client('textract')
def analyzedata(bucketName,documentKey):
print("Loading")
AnalyzedData= textract.StartDocumentAnalysis("DocumentLocation": {
"S3Object": {
"Bucket": "bucketName",
"Name": "documentKey",
})
detectedText = ''
# Print detected text
for item in AnalyzedData['Blocks']:
if item['BlockType'] == 'LINE':
detectedText += item['Text'] + '\n'
return detectedText
def writeTextractToS3File(textractData, bucketName, createdS3Document):
print('Loading writeTextractToS3File')
generateFilePath = os.path.splitext(createdS3Document)[0] + '.csv'
s3.put_object(Body=textractData, Bucket=bucketName, Key=generateFilePath)
print('Generated ' + generateFilePath)
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
detectedText = analyzedata(bucket, key)
writeTextractToS3File(detectedText, bucket, key)
return 'Processing Done!'
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
The code is not yet complete but I am already getting syntax errors:
"errorMessage": "Syntax error in module 'lambda_function': invalid syntax (lambda_function.py, line 13)",
"errorType": "Runtime.UserCodeSyntaxError",
"stackTrace": [
" File \"/var/task/lambda_function.py\" Line 13\n AnalyzedData= textract.Start_Document_Analysis(\"DocumentLocation\": { \n"
]
}
According to the boto3 docs, your syntax should be more like:
AnalyzedData= textract.start_document_analysis(DocumentLocation={
"S3Object": {
"Bucket": "bucketName",
"Name": "documentKey",
})
Also note that the FeatureTypes
parameter is listed as required.