I trained a custom SpaCy Named entity recognition model to detect biased words in job description. Now that I trained 8 variantions (using different base model, training model, and pipeline setting), I want to evaluate which model is performing best.
But.. I can't find any documentation on the validation of these models. There are some numbers of recall, f1-score and precision on the meta.json file, in the output folder, but that is no sufficient.
Anyone knows how to validate or can link me to the correct documentation? The documentation seem nowhere to be found.
NOTE: Talking about SpaCy V3.x
During training you should provide "evaluation data" that can be used for validation. This will be evaluated periodically during training and appropriate scores will be printed.
Note that there's a lot of different terminology in use, but in spaCy there's "training data" that you actually train on and "evaluation data" which is not training and just used for scoring during the training process. To evaluate on held-out test data you can use the cli evaluate command.
Take a look at this fashion brands example project to see how "eval" data is configured and used.