Google cloud vision API error reading pdf

I am currently trying to process a large pdf document using google cloud vision API. When reading the document I am receiving an error that says "json_format.Parse( error". I have attached my code below. How can I fix this? Code

Solution

You are getting the error on that line of code because you are trying to pass json_string with type: <class 'bytes'> and a non existent object vision.types.AnnotateFilesResponse() to json_format.Parse() that requires:

google.protobuf.json_format.Parse(text, message,ignore_unknown_fields=False, descriptor_pool=None) Parses a JSON representation of a protocol message into a message.

Parameters:

text – Message JSON representation.

message – A protocol buffer message to merge into.

ignore_unknown_fields – If True, do not raise errors for unknown fields.

descriptor_pool – A Descriptor Pool for resolving types. If None use the default.

Returns The same message passed as argument.

Raises:: ParseError: On JSON parsing problems.

Since your goal is to read the response from your async_batch_annotate_files(), the JSON response from this method will be saved to the defined Cloud Storage Bucket output location. You can just read and parse the data in json_string by converting it to a dictionary. You can then work you way in the dictionary by referring to AnnotateFileResponse reference. using the code below:

output = blob_list[0]
json_string = output.download_as_string()
response = json.loads(json_string)
first_page_response = response['responses'][0]
annotation = first_page_response['fullTextAnnotation']

print('Full text:\n')
print(annotation['text'])

NOTE: Just make sure that you are getting the correct JSON response file (output = blob_list[0]), else the parsing of results will yield and error.