python nlp huggingface-transformers huggingface fine-tuning

What does fine-tuning a multilingual checkpoint mean?

I'm fine-tuning a SetFit model on a French dataset and following the guide in huggingface. They mention this point on the site that I didn't quite understand

"🌎 Multilingual support: SetFit can be used with any Sentence Transformer on the Hub, which means you can classify text in multiple languages by simply fine-tuning a multilingual checkpoint."

Does that mean I must find an already finetuned SetFit model in French when loading the model? As in replace "paraphrase-mpnet-base-v2" below with a French one?

model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")

Solution

What the point in the guide suggests is that multilingual models fine-tuned using SetFit method generalize well even on languages they did not see during the SetFit fine-tuning process. This seems to be generally true for multilingual language models but it probably does not do any damage to mention it explicitly, particularly when discussing SetFit, which is a method which usually works with a very small dataset (i.e. the dataset that might not be multilingual).

The finding is supported by the paper mentioned in the guide, where researchers show that model fine-tuned on English data using SetFit performs well on variety of languages (see table 4).

What I would take from it is this: if you fine-tune multilingual checkpoint (e.g. sentence-transformers/paraphrase-multilingual-mpnet-base-v2) and fine-tune it on French, it will perform well on French and probably will also perform well on other languages. If you plan to use the fine-tuned model only on French texts, you certainly can and try to fine-tune a specifically French model - however, it's certainly not true that you must do this.

However, if there exists a specifically French sentence transformer and you want to use your model only on French texts, I would recommend using the French model. Not because you must, but because it might perform better than the multilingual model.