Search code examples
nlpspacynamed-entity-recognition

Train NER SpaCy using en_trf_bertbaseuncased_lg model


I am currently working on NER project and I would like to improve my NER performance by trying new SpaCy model en_trf_bertbaseuncased_lg but it gave me error KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['ner']". Is it that SpaCy currently does not support NER for this language model? Thanks!

   # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        for itn in tqdm(range(n_iter)):
            random.shuffle(train_data_list)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(train_data_list, size=compounding(8., 64., 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
                           losses=losses)
            tqdm.write('Iter: ' + str(itn + 1) + ' Losses: ' + str(losses['ner']))
            if itn == 30 or itn == 40:
                output_dir = Path(output_dir)
                if not output_dir.exists():
                    output_dir.mkdir()
                nlp.to_disk(Path(output_dir))

It gave error on

nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
                           losses=losses)

Solution

  • According to the documentation of this model on spaCy here, this model doesn't support Named-Entity Recognition yet. It only supports:

    • sentencizer
    • trf_wordpiecer
    • trf_tok2vec

    You can get the available pipe for a given model like so:

    >>> import spacy
    
    >>> nlp = spacy.load("en_trf_bertbaseuncased_lg")
    >>> nlp.pipe_names
    [sentencizer, trf_wordpiecer, trf_tok2vec]