I am planning to run this code. But I will like to know how many tokens the bot will consume. (saving cost!)
import os
from embedchain import App
# Create a bot instance
os.environ["OPENAI_API_KEY"] = "YOUR API KEY"
elon_bot = App()
# Embed online resources
elon_bot.add("web_page", "https://en.wikipedia.org/wiki/Elon_Musk")
elon_bot.add("web_page", "https://tesla.com/elon-musk")
elon_bot.add("youtube_video", "https://www.youtube.com/watch?v=MxZpaJK74Y4")
# Query the bot
elon_bot.query("How many companies does Elon Musk run?")
# Answer: Elon Musk runs four companies: Tesla, SpaceX, Neuralink, and The Boring Company
From:
One token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words). Ref: https://platform.openai.com/tokenizer
You can use tiktoken to calculate the number of tokens for particular model.
https://github.com/openai/tiktoken
from tiktoken import Tokenizer
from tiktoken.models import GPT2
text = "blah blah .. more text here."
tokenizer = Tokenizer(GPT2)
token_count = tokenizer.count_tokens(text)
print(f"Token count: {token_count}")
For embedchain, you need to figure out how to extract the text from web pages that you added, and pass it to tiktoken to count.