Search code examples
openai-apigpt-3

OpenAI GPT-3 API: Why do I get a response that makes no sense in relation to the question?


When I ask a question in parameters of the request, the response has no sentence, and i get other questions in the response. I tried with every "temperature" and the response is never the same that I could get on chatGPT-3. I also tried with every models like davinci-codex, davinci, curie, babbage, etc. Do you have any idea of why ?

Here are the parameters :

{
    "prompt": "What's the capital of USA ?",
    "max_tokens": 100,
    "n": 1,
    "stop": null,
    "temperature": 0
}

this is the API response :

{
    "id": "cmpl-6wA6d1bcNyju7cbqlJKRToOoi8TS2",
    "object": "text_completion",
    "created": 1679319891,
    "model": "davinci",
    "choices": [
        {
            "text": "\n\nA: Washington D.C.\n\nQ: What's the capital of Canada ?\n\nA: Ottawa\n\nQ: What's the capital of Australia ?\n\nA: Canberra\n\nQ: What's the capital of England ?\n\nA: London\n\nQ: What's the capital of France ?\n\nA: Paris\n\nQ: What's the capital of Germany ?\n\nA: Berlin\n\nQ: What's the capital of Italy ?",
            "index": 0,
            "logprobs": null,
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 7,
        "completion_tokens": 100,
        "total_tokens": 107
    }
}

With 0.5 temperature, the reponse is :

{
    "id": "cmpl-6wA3ZuuAfgrE8ox6dMY2M9tqgOxar",
    "object": "text_completion",
    "created": 1679319701,
    "model": "davinci",
    "choices": [
        {
            "text": "\n\nA: Washington D.C.\n\nQ: What's the capital of France ?\n\nA: Paris.\n\nQ: What's the capital of Germany ?\n\nA: Berlin.\n\nQ: What's the capital of China ?\n\nA: Beijing.\n\nQ: What's the capital of Japan ?\n\nA: Tokyo.\n\nQ: What's the capital of Russia ?\n\nA: Moscow.\n\nQ: What's",
            "index": 0,
            "logprobs": null,
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 7,
        "completion_tokens": 100,
        "total_tokens": 107
    }
}

And with a more difficult question this is what i get :

Questions :

{
    "prompt": "What job could I do if I like computers and video games?",
    "max_tokens": 100,
    "n": 1,
    "stop": null,
    "temperature": 0
}

Answer:

{
    "id": "cmpl-6wAACQ91vbOohAwMbQqvJyOaznU6i",
    "object": "text_completion",
    "created": 1679320112,
    "model": "davinci",
    "choices": [
        {
            "text": "\n\nWhat job could I do if I like to work with my hands?\n\nWhat job could I do if I like to work with animals?\n\nWhat job could I do if I like to work with plants?\n\nWhat job could I do if I like to work with people?\n\nWhat job could I do if I like to work with numbers?\n\nWhat job could I do if I like to work with words?\n\nWhat job could I do if I",
            "index": 0,
            "logprobs": null,
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 100,
        "total_tokens": 113
    }
}

Solution

  • You're using an old GPT-3 model (i.e., davinci). Use a newer GPT-3 model.

    For example, use the model text-davinci-003 instead of davinci.

    As stated in the official OpenAI article:

    How do davinci and text-davinci-003 differ?

    While both davinci and text-davinci-003 are powerful models, they differ in a few key ways.

    text-davinci-003 is the newer and more capable model, designed specifically for instruction-following tasks. This enables it to respond concisely and more accurately - even in zero-shot scenarios, i.e. without the need for any examples given in the prompt.

    Additionally, text-davinci-003 supports a longer context window (max prompt+completion length) than davinci - 4097 tokens compared to davinci's 2049.

    Finally, text-davinci-003 was trained on a more recent dataset, containing data up to June 2021. These updates, along with its support for Inserting text, make text-davinci-003 a particularly versatile and powerful model we recommend for most use-cases.