OpenAI's new embeddings API uses the cl100k_base
tokenizer. I'm calling it from the Node.js client, but I don't see any easy way of slicing my strings so they don't exceed the OpenAI limit of 8192 tokens.
This would be trivial if I could first encode the string, slice it to the limit, then decode it and send it to the API.
@dqbd/tiktoken supports the cl100k_base
encoding.