Search code examples
machine-learningdeep-learningartificial-intelligencegoogle-gemini

What are Tokens, Top K and Top P?


I'm learning to use Google AI Studio and when generating the snippet I came across these terms:

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

I'm struggling to understand what those terms mean. What are topP, topK, and maxOutputTokens. I want to understand these in order to use them properly.


Solution

  • You can find those details at the model parameters documentation.

    But in a short:

    • max output tokens limits the response max length. You literally limit how short (or long) you want your answer in tokens. Roughly speaking, just as a reference, 100 tokens is around 60-80 words.

    Gemini is a generative model which means that, in a high level explanation, it "composes" (or generates) an answer given its semantic knowledge in a given language (being a spoken language, a programming language, etc). So basically you can imagine a bag of possible "next tokens" when writing a sentence and top-k and top-p will customize the possible vocabulary to be considered.

    • with top-k basically you limit the possible tokens universe. If the next tokens can be 200 possible different ones, you limit in the top first k ones. So top-k = 30 means that the model you consider the first 30 tokens in the possible list. but the next tokens is not picked yet at this step.

    • with top-p you will work on a limit based on the cumulative probability. meaning: each token will have a probability related to how often the model saw the previous token followed by this token. So if you define top-p = 20 each means that from the 30 token you limited with top-k, it will generate a new list with the tokens that sum a max probability of 20%. ie. if the first token has a 10% probability, the second has 5%, the third has 4% and the fifth has 2% - the list after the top-p analysis will contain the first, the second and the third (10% + 5% + 4%). Yet the next token is not picked in the step too.

    • finally comes the temperature parameter which defines how deterministic the next token will be picked. A temperature equals to 0 drives the more deterministic choice where the token with higher priority will be chosen; temperature at maximum will be the more random choice of next token, which means that even the less probable token may be chosen too.

    hope that helps.