How to stop generating response in OpenAI library for Python?

I am using Completions in OpenAI library for Python. Something like this:

self.__response = self.client.chat.completions.create(
    model='gpt-4',
    messages=messages,
    stream=True
)

After this I just loop through chunks:

    for chunk in self.__response:
        text = chunk.choices[0].delta.content
        # Processing text here

Is it enough to just do break inside the loop to prevent server generating response and wasting tokens if I see that the response is not meeting my expectations? Or probably there is correct way to achieve this?

Solution

You are charged for all the tokens (words or parts of words) the API generates, even if you don't process them. So, breaking the loop early stops you from processing more tokens but doesn't stop you from being charged for them.

you can limit that using 'max_tokens', that'll save you from the cost, but in that case, you'll be forever stuck with the lower max_tokens response, even if that particular response is the desired one.