Search code examples
javascriptnode.jspostmangoogle-gemini

Error when trying to upload an image prompt to gemini api


I am using Postman and trying to get a response from Gemini API when trying to make a prompt including an image. I am sending the requesto to: https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent

{
  "contents":[
    {
      "parts":[
        {"text": "What is this picture?"},
        {
          "inline_data": {
            "mime_type":"image/jpeg",
            "data": "https://i.ibb.co/3mX1qcB/Document-sans-titre-page-0001.jpg"
          }
        }
      ]
    }
  ]
}

I am getting this response :

{
    "error": {
        "code": 400,
        "message": "Invalid value at 'contents[0].parts[1].inline_data.data' (TYPE_BYTES), Base64 decoding failed for \"https://pastebin.com/raw/kL4WEnnn\"",
        "status": "INVALID_ARGUMENT",
        "details": [
            {
                "@type": "type.googleapis.com/google.rpc.BadRequest",
                "fieldViolations": [
                    {
                        "field": "contents[0].parts[1].inline_data.data",
                        "description": "Invalid value at 'contents[0].parts[1].inline_data.data' (TYPE_BYTES), Base64 decoding failed for \"https://pastebin.com/raw/kL4WEnnn\""
                    }
                ]
            }
        ]
    }
}

I have tried to convert the image to a base64 raw text, upload it to pastebin and give it in the request but I have the same error can someone help me?


Solution

  • How about the following patterns?

    Pattern 1:

    In this pattern, inlineData is used.

    In this case, it is required to convert the image data (https://i.ibb.co/3mX1qcB/Document-sans-titre-page-0001.jpg) to the base64 data. When

    First, I created text data including the base64 data from the URL as follows. The filename is sampleRequestBody.txt.

    {"contents":[{"parts":[{"text":"What is this picture?"},{"inline_data":{"mime_type":"image/jpeg","data":"{base64 data converted from image data}"}}]}]}
    

    When this is used with a curl command, it becomes as follows.

    curl -s -X POST \
    -H "Content-Type: application/json" \
    -d @sampleRequestBody.txt \
    "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key={your API key}"
    

    When this curl command is run, the result shown in the "Testing" section is obtained.

    Pattern 2:

    In this pattern, fileData is used. In this case, the following flow is run.

    1. Upload the image data to Gemini

    curl "https://i.ibb.co/3mX1qcB/Document-sans-titre-page-0001.jpg" | curl --data-binary @- -X POST -H "Content-Type: image/jpeg" "https://generativelanguage.googleapis.com/upload/v1beta/files?uploadType=media&key={your API key}"
    

    By this, the following result is returned.

    {
      "file": {
        "name": "files/###s",
        "mimeType": "image/jpeg",
        "sizeBytes": "1271543",
        "createTime": "2024-07-17T00:00:00.000000Z",
        "updateTime": "2024-07-17T00:00:00.000000Z",
        "expirationTime": "2024-07-19T00:00:00.000000Z",
        "sha256Hash": "###",
        "uri": "https://generativelanguage.googleapis.com/v1beta/files/###",
        "state": "ACTIVE"
      }
    }
    

    Please copy the value of uri from the returned value.

    2. Generate content

    Using the value of uri, content is generated as follows. Here, fileData property is used.

    curl -s -X POST \
    -d '{"contents":[{"parts":[{"text":"What is this picture?"},{"fileData":{"mimeType":"image/jpeg","fileUri":"https://generativelanguage.googleapis.com/v1beta/files/###"}}]}]}' \
    -H "Content-Type: application/json" \
    "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key={your API key}"
    

    Testing:

    Both patterns return the following result. The generated text will be different.

    {
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "This image shows the Swagger documentation for a petstore API.  Swagger is a specification and toolset for describing, documenting, and consuming RESTful web services.  This particular documentation defines the endpoints and data structures for a petstore API.  It outlines how users can interact with the API to create, read, update, and delete pets, as well as manage their inventory."
              }
            ],
            "role": "model"
          },
          "finishReason": "STOP",
          "index": 0,
          "safetyRatings": [
            {
              "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
              "probability": "NEGLIGIBLE"
            },
            {
              "category": "HARM_CATEGORY_HATE_SPEECH",
              "probability": "NEGLIGIBLE"
            },
            {
              "category": "HARM_CATEGORY_HARASSMENT",
              "probability": "NEGLIGIBLE"
            },
            {
              "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
              "probability": "NEGLIGIBLE"
            }
          ]
        }
      ],
      "usageMetadata": {
        "promptTokenCount": 263,
        "candidatesTokenCount": 76,
        "totalTokenCount": 339
      }
    }
    

    References: