Search code examples
node.jsmachine-learningnlptokenizeopenai-api

Is there a JavaScript implementation of cl100k_base tokenizer?


OpenAI's new embeddings API uses the cl100k_base tokenizer. I'm calling it from the Node.js client, but I don't see any easy way of slicing my strings so they don't exceed the OpenAI limit of 8192 tokens.

This would be trivial if I could first encode the string, slice it to the limit, then decode it and send it to the API.


Solution

  • @dqbd/tiktoken supports the cl100k_base encoding.