Search code examples
nlpspacynamed-entity-recognition

Set validation data in SpaCy NER training


Is it possible to train SpaCy NER with validation data? Or split some data to validation set like in Keras (validation_split in model.fit)? Thanks

with nlp.disable_pipes(*other_pipes):  # only train NER
        for itn in tqdm(range(n_iter)):
            random.shuffle(train_data_list)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(train_data_list, size=compounding(8., 64., 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
                           losses=losses)

Solution

  • Use the spacy train CLI instead of the demo script:

    spacy train lang /path/to/output train.json dev.json
    

    The validation data is used to choose the best model from the training iterations and optionally for early stopping.

    The main task is converting your data to spacy's JSON training format, see: https://stackoverflow.com/a/59209377/461847