I using the guide from https://learn.microsoft.com/en-us/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python to recognize content with Databricks.
The code that I'm using is
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential
endpoint = "https://<region>.api.cognitive.microsoft.com/"
credential = AzureKeyCredential("<api_key>")
form_recognizer_client = FormRecognizerClient(endpoint, credential)
with open("/dbfs/mnt/lake/RAW/export/sentimenttest.txt", "rb") as fd:
form = fd.read()
poller = form_recognizer_client.begin_recognize_content(form)
form_pages = poller.result()
for content in form_pages:
for table in content.tables:
print("Table found on page {}:".format(table.page_number))
print("Table location {}:".format(table.bounding_box))
for cell in table.cells:
print("Cell text: {}".format(cell.text))
print("Location: {}".format(cell.bounding_box))
print("Confidence score: {}\n".format(cell.confidence))
if content.selection_marks:
print("Selection marks found on page {}:".format(content.page_number))
for selection_mark in content.selection_marks:
print("Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
selection_mark.state,
selection_mark.bounding_box,
selection_mark.confidence
))
You will notice the path that I'm using is
/dbfs/mnt/lake/RAW/export/sentimenttest.txt
When I execute the code I get the error:
ValueError: Content type could not be auto-detected. Please pass the content_type keyword argument.
Can someone let me know what I need to do to fix this
Prerequisites
• Python 2.7, or 3.5 or later is required to use this package.
• You must have an Azure subscription and a Cognitive Services or Form Recognizer resource to use this package.
Extract text and content/layout information from a given document. The input document must be of one of the supported content types - 'application/pdf', 'image/jpeg', 'image/png', 'image/tiff' or 'image/bmp'.
New in version v2.1: The pages, language and reading order keyword arguments and support for image/bmp content
Refer this link for more information