Search code examples
openai-api

OpenAI fine tuning each of the classes must start with a different token error


I am trying to run a fine tune similar to the one in openAI cookbook example for a multiclass classification problem. After preparing the train and valid jsonl files with fine_tunes.prepare_data, when I try to run the recommended fine_tunes.create command, I'm getting the following error:

If compute_classification_metrics is True, each of the classes must start with a different token. You can view your class tokenizations at https://beta.openai.com/tokenizer?view=bpe.. Fine-tune failed. For help, please contact OpenAI and include your fine-tune ID.


Solution

  • Looks like this error comes when the completion value is of more than a single token. After changing the completion values to numerical ids to ensure that they are of single tokens, the fine tune ran fine.

    I'm not sure why the prepare_data step itself didn't say any error regarding this given that I used openai command line tool to prepare that.