Search code examples
google-cloud-platformcloud-document-ai

Google Document AI model not reading document in JSON


I have been trying out the various processors (form parser, document OCR and the specialized ones). I am testing it on some purchase order PDFs and therefore using the "purchase order" processor. For some reason, the PDF is scanned and parsed through the processor, but the JSON output is not in structured key value pairs.

Here is the output:

{ document: { text: 'Order -POS178463\nP\nTo\nDelivery Address\nPOSTURITE\nOcushield Ltd\nDoor 14 (060) -Posturite Goods In\nAccount No.\nOCU01\nDhruvin Patel\nTVS SCS Rico\nOrder No.\nPOS178463\nLaunch Lab, Floor 4\n215 Park Lane\nOrder Date\n24/04/2023\n124 Goswell Road\nMinworth\nCust Ref.\nLondon\nB35 6LJ\nEC1V 7DP\nUnited Kingdom\nUnited Kingdom\nProduct\nDescription\nQty\nUnit Price\nAmount\nOCUVDU27BZ\nAnti Blue Light Privacy Filter 27" W (16:9)\n10\n£45.32\n£453.20\nContact:\nGoods Total\n£453.20\nCust Ref:\nDelivery Notes:\nPosturite Limited\nt. +44 (0) 345 345 0010\ne. purchaseconfirmation@posturite.co.uk\nwww.posturite.co.uk\nB\nISO 9001\nISO 14001\nISO 27001\nISO 45001\nISOQAR\nUKAS\nENSTEMS\n0026\nThe Mill, Berwick, East Sussex BN26 6SZ, UK\nRegistered in England No. 2574809\nCertificate Number 5312\n' },
  humanReviewStatus: { state: 'SKIPPED' } }

I expect as structured key value pair output in JSON and unsure why it does not work.


Solution

  • It looks like the Document JSON is only including the text field.

    Are you sending a fieldMask with your request? This will limit the fields that are returned in the Document object response. Refer to send a processing request in the documentation for how to process a document with and without a fieldMask, and refer to handle the processing response for how to extract the data from the output Document.

    For the Form Parser the key-value pairs will be in the Document.pages.formFields field, and for all Entity Extraction processors, such as the Purchase order parser, they will be in the entities field.