Search code examples
google-cloud-platformcloud-document-ai

How to batch send documents in DocumentAI?


I am doing the processDocument process using the expense parser as in the example here. Since the billing costs too much, instead of sending the documents one by one, i combine 10 documents into one pdf and use processDocument again. However, DocumentAI sees 10 separate receipts that we have combined as a single receipt, and instead of returning 10 different total_amount entities for each receipt, 1 total_amount returns.I want to combine 10 documents into one pdf and send it for less billing cost. In addition, i am looking for a way to think of each document independently from each other and extract its entities separately. Will batch processing work for me? What can I do for it? Can you help me please?


Solution

  • Unfortunately there is no way to make the billing cheaper because the pricing of Document AI is calculated on a per page/document basis. See Document AI pricing.

    With regards to your question:

    I am looking for a way to think of each document independently from each other and extract its entities separately. Will batch processing work for me?

    Yes batch processing will work for you, but pricing is just the same with processDocument. See the pricing info I have attached above.

    The only difference between batch processing and processDocument is that instead of sending a single request for a single document, batch processing will send all your documents in a single request. The response will then be stored in a GCS bucket that you have defined on the batch process options. See batch process sample code.

    Another thing to add is batch processing process the documents asynchronously. This means that when the request is sent, the processing is done on the backend and you can poll the status of your request to see if it is still processing or it is done.