I am currently trying to process a large pdf document using google cloud vision API. When reading the document I am receiving an error that says "json_format.Parse( error". I have attached my code below. How can I fix this? Code
You are getting the error on that line of code because you are trying to pass json_string
with type: <class 'bytes'> and a non existent object vision.types.AnnotateFilesResponse()
to json_format.Parse() that requires:
google.protobuf.json_format.Parse(text, message,ignore_unknown_fields=False, descriptor_pool=None) Parses a JSON representation of a protocol message into a message.
Parameters:
- text – Message JSON representation.
- message – A protocol buffer message to merge into.
- ignore_unknown_fields – If True, do not raise errors for unknown fields.
- descriptor_pool – A Descriptor Pool for resolving types. If None use the default.
Returns The same message passed as argument.
Raises:: ParseError: On JSON parsing problems.
Since your goal is to read the response from your async_batch_annotate_files()
, the JSON response from this method will be saved to the defined Cloud Storage Bucket output location. You can just read and parse the data in json_string
by converting it to a dictionary. You can then work you way in the dictionary by referring to AnnotateFileResponse reference. using the code below:
output = blob_list[0]
json_string = output.download_as_string()
response = json.loads(json_string)
first_page_response = response['responses'][0]
annotation = first_page_response['fullTextAnnotation']
print('Full text:\n')
print(annotation['text'])
NOTE: Just make sure that you are getting the correct JSON response file (output = blob_list[0]
), else the parsing of results will yield and error.