I trained fasttext embeddings and saved them as a .vec
file.
I want to use these for my spacy NER model. Is there a difference between
python -m spacy train en [new_model] [train_data] [dev_data] --pipeline ner --base-model embeddings.vec
and
python -m spacy train en [new_model] [train_data] [dev_data] --pipeline ner --vectors embeddings.vec
?
Both methods produce nearly identical training loss, F score, etc.
If you need to initialize a spacy model with vectors, use spacy init-model
like this where lg
is the language code:
spacy init-model lg model_dir -v embeddings.vec -vn my_custom_vectors
Once you have the vectors saved as part of a spacy model:
--vectors
loads the vectors from the provided model, so the initial model is spacy.blank("lg")
+ vectors--base-model
loads everything (tokenizer, pipeline components, vectors) from the provided model, so the initial model is spacy.load(model)
If the provided model doesn't have any pipeline components in it, the only potential difference is the tokenizer settings resulting from spacy.blank("lg")
which can vary a little between individual spacy versions.