I would like to create a human review loop for images that undergone OCR using Amazon Textract and Entity Extraction using Amazon Comprehend.
My process is:
KEY_VALUE_SET
to textract's JSON output per the docscrowd-textract-analyze-document
element in the template and feed it the modified textract outputWhat fails to work in this process is step 5. My custom entities are not rendered properly. By "fails to work" I mean that the entities are not highlighted on the image when I click them on the sidebar. There is no error in the browser's console.
Has anyone tried such a thing?
Sorry for not including examples. I will remove secrets/PII from my files and attach them to the question
I used the AWS documentation of the a2i-crowd-textract-detection human task element to generate the value of the initialValue
attribute. It appears the doc for that attribute is incorrect. While the the doc shows that the value should be in the same format as the output of Textract, namely:
[
{
"BlockType": "KEY_VALUE_SET",
"Confidence": 38.43309020996094,
"Geometry": { ... }
"Id": "8c97b240-0969-4678-834a-646c95da9cf4",
"Relationships": [
{ "Type": "CHILD", "Ids": [...]},
{ "Type": "VALUE", "Ids": [...]}
],
"EntityTypes": ["KEY"],
"Text": "Foo bar"
},
]
the a2i-crowd-textract-detection
expects the input to have lowerCamelCase attribute names (rather than UpperCamelCase). For example:
[
{
"blockType": "KEY_VALUE_SET",
"confidence": 38.43309020996094,
"geometry": { ... }
"id": "8c97b240-0969-4678-834a-646c95da9cf4",
"relationships": [
{ "Type": "CHILD", "ids": [...]},
{ "Type": "VALUE", "ids": [...]}
],
"entityTypes": ["KEY"],
"text": "Foo bar"
},
]
I opened a support case about this documentation error to AWS.