Search code examples
nlpnamed-entity-recognitionpre-trained-modeldeeppavlov

Retrain the multi language NER model(ner_ontonotes_bert_mult) from DeepPavlov with a dataset in a different language


I have successfully installed the multi-language NER model from DeepPavlov(ner_ontonotes_bert_mult). I want to retrain this model with new data(in the same format as they suggest in the documentation page) that are in the Albanian language.Is this possible(to retrain the multi-language NER model from DeepPavlov with data in a different language), or the retrain works only if we have English data??


Solution

  • Yes, you can fine-tune the model on any language that was used for Multilingual BERT training https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages.

    It is also possible to fine-tune on languages that are not from the list above if multilingual vocabulary has a good coverage for your language.