Search code examples
google-cloud-platformgoogle-oauthstable-diffusionlarge-language-modelgoogle-generativeai

Is it possible to call Google's Imagen API from a non interactive back-end?


I'm aiming to use Imagen in QnA mode from a non interactive back-end. The documentation (https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/imagetext-vqa?project=gdg-demos&cloudshell=true) fills in a Bearer Token using the gcloud auth print-access-token command. If I execute that in the cloud shell I get a token, but that won't be usable in a non interactive back-end.

    base64_string = base64_bytes.decode(ENCODING)

    VQA_PROMPT = "Describe the content of the image in great detail"

    payload = {
      "instances": [
        {
          "prompt": VQA_PROMPT,
          "image": {
              "bytesBase64Encoded": base64_string
          }
        }
      ],
      "parameters": parameters
    }

    url = "https://us-central1-aiplatform.googleapis.com/v1/projects/gdg-demos/locations/us-central1/publishers/google/models/imagetext:predict"
    headers = {
        "Authorization": "Bearer {}".format(bearer_token),
        "Accept": "application/json; charset=utf-8",
    }
    json_data = requests.post(url, headers=headers, json=payload)

I'm getting a 401 HTTP status code response:

b'{
  "error": {
    "code": 401,
    "message": "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.",
    "status": "UNAUTHENTICATED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "ACCESS_TOKEN_TYPE_UNSUPPORTED",
        "metadata": {
          "service": "aiplatform.googleapis.com",
          "method": "google.cloud.aiplatform.v1.PredictionService.Predict"
        }
      }
    ]
  }
}'

I tried https://saturncloud.io/blog/authenticate-to-google-container-service-with-script-noninteractive-gcloud-auth-login/

  1. Authenticate to GCS: gcloud auth login --brief --quiet
  2. Retrieve refresh token: REFRESH_TOKEN=$(gcloud auth print-access-token)
  3. Activate refresh token: gcloud auth activate-refresh-token $REFRESH_TOKEN

I opened a terminal with the JupyterLab I'm tinkering with. I was able to activate a refresh token, and got the Activated refresh token credentials: [***] after the third step. Then I tried to use that token as the Bearer token, but I got back a 403 HTTP status code with Forbidden. Same if I perform a regular (non brief and non quiet) gcloud auth print-access-token in that terminal and tried that token too, but got a 403 as well.


Solution

  • Kudos to Anish Nangia of Google pointing out that I was looking at the wrong code. The OAuth code in my question won't work. Here is the code I should use: https://cloud.google.com/vertex-ai/docs/generative-ai/image/visual-question-answering#-python

    Note, that when experimenting in my local Conda Jupyter Notebooks (https://github.com/CsabaConsulting/NextGenAI/blob/main/ImagenTest.ipynb) I'd still need to deal with ADC (Application Default Credentials), see https://cloud.google.com/docs/authentication#auth-decision-tree and https://cloud.google.com/docs/authentication/application-default-credentials Then you'll get a Your application is authenticating by using local Application Default Credentials. The aiplatform.googleapis.com API requires a quota project, which is not set by default. To learn how to set your quota project..., so there are interesting hoops, but those can be tackled.

    When deployed in a Cloud Function you want to establish a right service account. Example code: https://github.com/CsabaConsulting/NextGenAI/tree/main/imagen_test

    requirements.txt:

    functions-framework==3.*
    google-cloud-aiplatform==1.35.*
    

    main.py:

    import base64
    import functions_framework
    import vertexai
    
    from flask import jsonify
    from vertexai.vision_models import ImageQnAModel, ImageTextModel, Image
    
    PROJECT_ID = "gdg-demos"
    LOCATION = "us-central1"
    
    @functions_framework.http
    def imagen_test(request):
        """HTTP Cloud Function.
        Args:
            request (flask.Request): The request object.
            <https://flask.palletsprojects.com/en/1.1.x/api/#incoming-request-data>
        Returns:
            The response text, or any set of values that can be turned into a
            Response object using `make_response`
            <https://flask.palletsprojects.com/en/1.1.x/api/#flask.make_response>.
        """
        request_json = request.get_json(silent=True)
        request_args = request.args
    
        if request_json and 'image' in request_json:
            image_b64 = request_json['image']
        elif request_args and 'image' in request_args:
            image_b64 = request_args['image']
        else:
            image_b64 = None
    
        if not image_b64:
            return jsonify(dict(data=[]))
    
        vertexai.init(project=PROJECT_ID, location=LOCATION)
        model = ImageQnAModel.from_pretrained("imagetext@001")
    
        image_binary = base64.b64decode(image_b64)
        image = Image(image_binary)
        answers = model.ask_question(
            image=image,
            question="Describe what is on the photo in great detail, be very verbose",
            number_of_results=3,
        )
        return jsonify(dict(data=answers))