Search code examples
javajsongoogle-cloud-platformartificial-intelligencecloud-document-ai

How to read json response string into Document object of Document AI from java?


I'm working with another API which calls the google Document AI API. I'm trying to read the JSON String from the file into a Document object. How should this be done?

I tried the following but it is not working.

import com.google.cloud.documentai.v1.Document;
import java.io.FileInputStream;

Document document = Document.parseFrom(new FileInputStream("src/main/resources/responseFromAPICall.json"));
System.out.println(document.getText());

I'm getting this error:

Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
    at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:129)
    at com.google.protobuf.CodedInputStream$StreamDecoder.checkLastTagWas(CodedInputStream.java:2124)
    at com.google.protobuf.CodedInputStream$StreamDecoder.readGroup(CodedInputStream.java:2358)

Solution

  • Today I came across this issue as well. This answer gave me the starting point for a solution.

    If your json file was saved from a call to Document AI and looks like:

    {
      "document": {
        ...
        "text": "...",
        ...
      },
      "humanReviewStatus": {...}
    }
    

    you may use the following code snippet:

    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    
    import com.google.cloud.documentai.v1.Document;
    import com.google.cloud.documentai.v1.ProcessResponse;
    import com.google.protobuf.util.JsonFormat;
    
    Path filePath = Paths.get("src/main/resources/responseFromAPICall.json");
    ProcessResponse.Builder responseBuilder = ProcessResponse.newBuilder();
    JsonFormat.parser().merge(Files.newBufferedReader(filePath), responseBuilder);
    Document document = responseBuilder.getDocument();
    System.out.println(document.getText());
    

    If your json file only contains the "document" object:

    {
      ...
      "text": "...",
      ...
    }
    

    This code will do the trick:

    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    
    import com.google.cloud.documentai.v1.Document;
    import com.google.protobuf.util.JsonFormat;
    
    Path filePath = Paths.get("src/main/resources/responseFromAPICall.json");
    Document.Builder docBuilder = Document.newBuilder();
    JsonFormat.parser().merge(Files.newBufferedReader(filePath), docBuilder);
    System.out.println(docBuilder.getText());