Search code examples
google-cloud-vision

Google Vision API Document Text multiple images in base64 String


I use the Google Vision API OCR (Document Text Detection) to get the text from a scanned document (base64 String). It works perfekt for one image. But how can I send more than one image, e.g. the second page of a document.

I´ve tried to merge the base64 strings but it do not work.

var base64ImagesArrayConcarved = base64ImagesArray.join('')

Solution

  • Cloud Vision API has the method files.asyncBatchAnnotate. which enables sending a bunch of files in the same request. To add individual files use async file annotation requests. An example of including two images in a batch request is the following:

    {
      "requests":[
        {
          "inputConfig": {
            "gcsSource": {
              "uri": "gs://<your bucket name>/image1.jpg"
            },
            "mimeType": "image/jpg"
          },
          "features": [
            {
              "type": "DOCUMENT_TEXT_DETECTION"
            }
          ],
          "outputConfig": {
            "gcsDestination": {
              "uri": "gs://<your bucket name>/output/"
            }
          }
        },
        {
          "inputConfig": {
            "gcsSource": {
              "uri": "gs://<your bucket name>/image2.jpg"
            },
            "mimeType": "image/jpg"
          },
          "features": [
            {
              "type": "DOCUMENT_TEXT_DETECTION"
            }
          ],
          "outputConfig": {
            "gcsDestination": {
              "uri": "gs://<your bucket name>/output/"
            }
          }
        }
      ]
    }
    

    If you specifically are working with pdf files, I found this post that explains how to send a request using also asyncBatchAnnotate.