My LLM (GPT-4, Llama 2) give different responses for the same prompt

I am using an OpenAI GPT-4 API (also tested with Llama 2) to generate responses for a chatbot. However, I noticed that sometimes the model gives different answers even when the prompt is exactly the same.

Here is a minimal example:

import openai
prompt = "What is the capital of France?"
    for _ in range(3):  
    response = openai.ChatCompletion.create(  
    model="gpt-4",  
    messages=[{"role": "user", "content": prompt}],  
    temperature=0.7  # Even at 0.7, I expect some consistency
)  
print(response["choices"][0]["message"]["content"])

I expected consistent answers (e.g., "Paris"), but sometimes the response contains additional explanations, while other times it’s short.

Why does this happen even when I set a temperature? How can I force a deterministic response?

Solution

There is a temperature parameter that is equivalent to creativity. 0 being almost deterministic and 1 being maximum creative. Default is usually set to 0.7. Try it with 0.0 and see if it behaves more deterministic and gives the same answer.