OpenAI API: How to count tokens before API request

I would like to count the tokens of my OpenAI API request in R before sending it (version gpt-3.5-turbo). Since the OpenAI API has rate limits, this seems important to me.

Example:

The function I use to send requests:

ask_chatgpt <- function(prompt) {
      response <- POST(
        url = "https://api.openai.com/v1/chat/completions", 
        add_headers(Authorization = paste("Bearer", api_key)),
        content_type_json(),
        encode = "json",
        body = list(
          model = "gpt-3.5-turbo",
          messages = list(list(
            role = "user", 
            content = prompt
          ))
        )
      )
      str_trim(content(response)$choices[[1]]$message$content)
    }

Example


api_key <- "your_openai_api_key" 

library(httr)
library(tidyverse)

#Calls the ChatGPT API with the given prompt and returns the answer
ask_chatgpt <- function(prompt) {
  response <- POST(
    url = "https://api.openai.com/v1/chat/completions", 
    add_headers(Authorization = paste("Bearer", api_key)),
    content_type_json(),
    encode = "json",
    body = list(
      model = "gpt-3.5-turbo",
      messages = list(list(
        role = "user", 
        content = prompt
      ))
    )
  )
  str_trim(content(response)$choices[[1]]$message$content)
}

prompt <- "how do I count the token in R for gpt-3.45-turbo?"

ask_chatgpt(prompt)
#> [1] "As an AI language model, I am not sure what you mean by \"count the token in R for gpt-3.5-turbo.\" Please provide more context or clarification so that I can better understand your question and provide an appropriate answer."

^{Created on 2023-04-24 with reprex v2.0.2}

I would like to calculate/estimate as how many tokens prompt will need with gtp-3.5-turbo

There is a similar question for gtp-3 and python, where the tiktoken library is recommended. However, I could not find a similar library in R.

OpenAI also recommends tiktoken or gpt-3-encoder package for JavaScript.

Solution

OpenAI has their own tokenizer so you probably won't be able to reproduce it. Instead, I would just recommend using their python API via the reticulate package

First, install the tiktoken package via the command line using:

pip install tiktoken

Then, in R

library(reticulate)
tiktoken <- import("tiktoken")
encoding <- tiktoken$encoding_for_model("gpt-3.5-turbo")
prompt <- "how do I count the token in R for gpt-3.45-turbo?"
length(encoding$encode(prompt))
# [1] 19