I would like to train my own Stanford NER CRF model. I have a train, validation and test dataset. https://nlp.stanford.edu/software/crf-faq.shtml#a
Inside the properties file I can specify the path for my training and test dataset. How is it possible to use the validation set within the training and later evaluate only on the test dataset? How do I use the train, test and validation data set correctly?
Thank you for your help!
Stanford NLP CRF does not use validation data for choosing the best model. Accordingly, you can use your dev set however you like. One possibility is to train several different models with different hyperparameters, choosing the best model by comparing scores on the dev set. Another possibility is to add the dev set to the training data.
The testFile flag controls which dataset you get scores for. If you decide to use your dev set for choosing the best hyperparameters, you would set testFile to the dev set path for the initial models. You can then set testFile to the test set for the final score once you have chosen a model structure.