Search code examples
djangocloud-document-ai

Uploading file directly to Google Cloud Document AI


I am trying to upload a file directly to Google Cloud Document AI for processing. I am receiving the error

400 Request contains an invalid argument. [field_violations { field: "raw_document.content" description: "Inline document content must be provided."

My code:

def upload(request): template_name = "upload.html"

# if this is a POST request we need to process the form data
if request.method == "POST":
    # create a form instance and populate it with data from the request:
    form = UploadReceiptForm(request.POST, request.FILES)
    
    if form.is_valid:
   
        docai_client = documentai.DocumentProcessorServiceClient(
            client_options=ClientOptions(
                api_endpoint=globals.GOOGLE_CLOUD_DOCUMENT_AI_ENDPOINT
            )
        )

        # The full resource name of the processor version, e.g.:
        # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
        name = docai_client.processor_version_path(
            globals.GOOGLE_CLOUD_PROJECT_ID, 
            globals.GOOGLE_CLOUD_DEFAULT_LOCATION, 
            globals.GOOGLE_CLOUD_DEFAULT_PROCESSOR_ID, 
            globals.GOOGLE_CLOUD_DEFAULT_PROCESSOR_VERSION_ID
        )
        
        # Configure the process request
        image_content = request.FILES["file"].read()
        request = documentai.ProcessRequest(
            name=name,
            raw_document=documentai.RawDocument(content=image_content, mime_type="image/jpeg"),
        )

        result = docai_client.process_document(request=request)
        
        # redirect to a new URL:
        return HttpResponseRedirect("/upload/")

# if a GET (or any other method) we'll create a blank form
else:
    form = UploadReceiptForm()

return render(request,template_name, {"form": form})

Thanks in advance for the help!


Solution

  • It looks like you're following the online processing code sample correctly.

    However, your processing request isn't actually including the file content. Be sure that image_content = request.FILES["file"].read() actually includes the file content in binary/base64 encoding.

    Make sure image_content is not None and you might also need to encode the data as base64. You can use the sample on this page as a guide.