Search code examples
pythonpython-3.xspacyspacy-3

Retrain custom language model with the current spacy version (Compatibility issues)


i have installed spacy with two language models: vi_spacy (not from me), which is a custom Vietnamese language model for spacy from Github here and the japanese ja_core_news_trf model. I first installed the ja_core_news_trf model with the python -m spacy download ja_core_news_trf command in anaconda command line and it worked without a problem. Then when i installed vi_spacy using in the command line and trying it out it worked. But when i tried the japanese model didn't work anymore.

Each time i get this error:

OSError: [E050] Can't find model 'ja_core_news_trf'. It doesn't seem to be a Python package or a valid path to a data directory.

even though when i type pip list command ja_core_news_trf is installed. After investigating i found out that vi_spacy only works with spacy v3.0.8 but ja_core_news_trf need spaCy >=3.2.0,<3.3.0 and is incompatible with the current version. After typing python -m spacy info i get this error message:

UserWarning: [W095] Model 'ja_core_news_trf' (3.2.0) requires spaCy >=3.2.0,<3.3.0 and is incompatible with the current version (3.0.8). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate

after running python -m spacy validate i get this:

   ja_core_news_trf   >=3.2.0,<3.3.0   3.2.0     --> n/a
    vi_core_news_lg    >=3.0.5,<3.1.0   0.0.1     ✔

 The following packages are custom spaCy pipelines or not available for spaCy
v3.0.8:
ja_core_news_trf

So my question how can i retrain the custom vietnamese model with the current spaCy version? Of course i tried to contact the developer but he doesn't reply, so i wanted to do it myself it it's possible.


Solution

  • In nearly all cases spaCy v3 models are forwards-compatible with newer versions of spaCy v3, so download ja_core_news_trf and then install the Vietnamese model with pip install --no-deps so that pip doesn't install an older version of spacy as a dependency.

    You'll get a warning on load that an older model might be incompatible, but test it on your data and as long as the performance is the same as with the older version of spacy, it should be fine to use.

    See: https://spacy.io/usage/v3-2#upgrading

    You can only retrain the model if you have access to the original training data.