While I was using the Chat Completions API, I learned that you need to include the trail of the questions from the user and answers from the OpenAI API (including the system messages) when asking a new question if you want the Chat Completions API to be able to have the chat history included.
With the Assistants API, you don't need to do that, and it remembers the chat history.
My question is, what happens to token consumption in the case of the Assistants API? Would all the past messages be included in the token consumption?
Token consumption in the Assistants API can be very, very high if you use the same thread for a long time because the thread is storing message history and passing the whole thread to the API every time you ask a new question using the existing thread.
After some time, a single message you ask the Assistants API can cost a lot, even if the message is short. See the past discussion:
/ ... /
The message contains around 1000 tokens, checked via https://platform.openai.com/tokenizer
/ ... /
This code takes around 250,000 tokens to complete. The image shows today's token usage for three requests.
What the developer didn't understand is that your recent message might contain 1,000 tokens, but you also need to keep in mind that hundreds of messages that were either asked by you or answered by the assistant in the past were also sent to the Assistants API.
There is, however, a limit of 100,000 messages per thread. As stated in the official OpenAI documentation:
The contents of the messages your users or applications create are added as Message objects to the Thread. Messages can contain both text and files. There is a limit of 100,000 Messages per Thread and we smartly truncate any context that does not fit into the model's context window.