google-cloud-platform google-oauth stable-diffusion large-language-model google-generativeai

Is it possible to call Google's Imagen API from a non interactive back-end?

I'm aiming to use Imagen in QnA mode from a non interactive back-end. The documentation (https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/imagetext-vqa?project=gdg-demos&cloudshell=true) fills in a Bearer Token using the gcloud auth print-access-token command. If I execute that in the cloud shell I get a token, but that won't be usable in a non interactive back-end.

    base64_string = base64_bytes.decode(ENCODING)

    VQA_PROMPT = "Describe the content of the image in great detail"

    payload = {
      "instances": [
        {
          "prompt": VQA_PROMPT,
          "image": {
              "bytesBase64Encoded": base64_string
          }
        }
      ],
      "parameters": parameters
    }

    url = "https://us-central1-aiplatform.googleapis.com/v1/projects/gdg-demos/locations/us-central1/publishers/google/models/imagetext:predict"
    headers = {
        "Authorization": "Bearer {}".format(bearer_token),
        "Accept": "application/json; charset=utf-8",
    }
    json_data = requests.post(url, headers=headers, json=payload)

I'm getting a 401 HTTP status code response:

b'{
  "error": {
    "code": 401,
    "message": "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.",
    "status": "UNAUTHENTICATED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "ACCESS_TOKEN_TYPE_UNSUPPORTED",
        "metadata": {
          "service": "aiplatform.googleapis.com",
          "method": "google.cloud.aiplatform.v1.PredictionService.Predict"
        }
      }
    ]
  }
}'

I tried https://saturncloud.io/blog/authenticate-to-google-container-service-with-script-noninteractive-gcloud-auth-login/

Authenticate to GCS: gcloud auth login --brief --quiet
Retrieve refresh token: REFRESH_TOKEN=$(gcloud auth print-access-token)
Activate refresh token: gcloud auth activate-refresh-token $REFRESH_TOKEN

I opened a terminal with the JupyterLab I'm tinkering with. I was able to activate a refresh token, and got the Activated refresh token credentials: [***] after the third step. Then I tried to use that token as the Bearer token, but I got back a 403 HTTP status code with Forbidden. Same if I perform a regular (non brief and non quiet) gcloud auth print-access-token in that terminal and tried that token too, but got a 403 as well.

Solution

Kudos to Anish Nangia of Google pointing out that I was looking at the wrong code. The OAuth code in my question won't work. Here is the code I should use: https://cloud.google.com/vertex-ai/docs/generative-ai/image/visual-question-answering#-python

Note, that when experimenting in my local Conda Jupyter Notebooks (https://github.com/CsabaConsulting/NextGenAI/blob/main/ImagenTest.ipynb) I'd still need to deal with ADC (Application Default Credentials), see https://cloud.google.com/docs/authentication#auth-decision-tree and https://cloud.google.com/docs/authentication/application-default-credentials Then you'll get a Your application is authenticating by using local Application Default Credentials. The aiplatform.googleapis.com API requires a quota project, which is not set by default. To learn how to set your quota project..., so there are interesting hoops, but those can be tackled.

When deployed in a Cloud Function you want to establish a right service account. Example code: https://github.com/CsabaConsulting/NextGenAI/tree/main/imagen_test

requirements.txt:

functions-framework==3.*
google-cloud-aiplatform==1.35.*

main.py:

import base64
import functions_framework
import vertexai

from flask import jsonify
from vertexai.vision_models import ImageQnAModel, ImageTextModel, Image

PROJECT_ID = "gdg-demos"
LOCATION = "us-central1"

@functions_framework.http
def imagen_test(request):
    """HTTP Cloud Function.
    Args:
        request (flask.Request): The request object.
        <https://flask.palletsprojects.com/en/1.1.x/api/#incoming-request-data>
    Returns:
        The response text, or any set of values that can be turned into a
        Response object using `make_response`
        <https://flask.palletsprojects.com/en/1.1.x/api/#flask.make_response>.
    """
    request_json = request.get_json(silent=True)
    request_args = request.args

    if request_json and 'image' in request_json:
        image_b64 = request_json['image']
    elif request_args and 'image' in request_args:
        image_b64 = request_args['image']
    else:
        image_b64 = None

    if not image_b64:
        return jsonify(dict(data=[]))

    vertexai.init(project=PROJECT_ID, location=LOCATION)
    model = ImageQnAModel.from_pretrained("imagetext@001")

    image_binary = base64.b64decode(image_b64)
    image = Image(image_binary)
    answers = model.ask_question(
        image=image,
        question="Describe what is on the photo in great detail, be very verbose",
        number_of_results=3,
    )
    return jsonify(dict(data=answers))