Search code examples
rnlpopenai-api

OpenAI API: How to count tokens before API request


I would like to count the tokens of my OpenAI API request in R before sending it (version gpt-3.5-turbo). Since the OpenAI API has rate limits, this seems important to me.

Example:

The function I use to send requests:

ask_chatgpt <- function(prompt) {
      response <- POST(
        url = "https://api.openai.com/v1/chat/completions", 
        add_headers(Authorization = paste("Bearer", api_key)),
        content_type_json(),
        encode = "json",
        body = list(
          model = "gpt-3.5-turbo",
          messages = list(list(
            role = "user", 
            content = prompt
          ))
        )
      )
      str_trim(content(response)$choices[[1]]$message$content)
    }

Example


api_key <- "your_openai_api_key" 

library(httr)
library(tidyverse)

#Calls the ChatGPT API with the given prompt and returns the answer
ask_chatgpt <- function(prompt) {
  response <- POST(
    url = "https://api.openai.com/v1/chat/completions", 
    add_headers(Authorization = paste("Bearer", api_key)),
    content_type_json(),
    encode = "json",
    body = list(
      model = "gpt-3.5-turbo",
      messages = list(list(
        role = "user", 
        content = prompt
      ))
    )
  )
  str_trim(content(response)$choices[[1]]$message$content)
}

prompt <- "how do I count the token in R for gpt-3.45-turbo?"

ask_chatgpt(prompt)
#> [1] "As an AI language model, I am not sure what you mean by \"count the token in R for gpt-3.5-turbo.\" Please provide more context or clarification so that I can better understand your question and provide an appropriate answer."

Created on 2023-04-24 with reprex v2.0.2

I would like to calculate/estimate as how many tokens prompt will need with gtp-3.5-turbo

There is a similar question for gtp-3 and python, where the tiktoken library is recommended. However, I could not find a similar library in R.

OpenAI also recommends tiktoken or gpt-3-encoder package for JavaScript.


Solution

  • OpenAI has their own tokenizer so you probably won't be able to reproduce it. Instead, I would just recommend using their python API via the reticulate package

    First, install the tiktoken package via the command line using:

    pip install tiktoken
    

    Then, in R

    library(reticulate)
    tiktoken <- import("tiktoken")
    encoding <- tiktoken$encoding_for_model("gpt-3.5-turbo")
    prompt <- "how do I count the token in R for gpt-3.45-turbo?"
    length(encoding$encode(prompt))
    # [1] 19