Search code examples
google-cloud-vision

Google cloud vision API error reading pdf


I am currently trying to process a large pdf document using google cloud vision API. When reading the document I am receiving an error that says "json_format.Parse( error". I have attached my code below. How can I fix this? Code


Solution

  • You are getting the error on that line of code because you are trying to pass json_string with type: <class 'bytes'> and a non existent object vision.types.AnnotateFilesResponse() to json_format.Parse() that requires:

    google.protobuf.json_format.Parse(text, message,ignore_unknown_fields=False, descriptor_pool=None) Parses a JSON representation of a protocol message into a message.

    Parameters:

    • text – Message JSON representation.
    • message – A protocol buffer message to merge into.
    • ignore_unknown_fields – If True, do not raise errors for unknown fields.
    • descriptor_pool – A Descriptor Pool for resolving types. If None use the default.

    Returns The same message passed as argument.

    Raises:: ParseError: On JSON parsing problems.

    Since your goal is to read the response from your async_batch_annotate_files(), the JSON response from this method will be saved to the defined Cloud Storage Bucket output location. You can just read and parse the data in json_string by converting it to a dictionary. You can then work you way in the dictionary by referring to AnnotateFileResponse reference. using the code below:

    output = blob_list[0]
    json_string = output.download_as_string()
    response = json.loads(json_string)
    first_page_response = response['responses'][0]
    annotation = first_page_response['fullTextAnnotation']
    
    print('Full text:\n')
    print(annotation['text'])
    

    NOTE: Just make sure that you are getting the correct JSON response file (output = blob_list[0]), else the parsing of results will yield and error.