Search code examples
pythonnamed-entity-recognitionspacy-3

SpaCy custom NER training AttributeError: 'DocBin' object has no attribute 'to_disk'


I want to train a custom NER model using spaCy v3 I prepared my train data and I used this script

import spacy
from spacy.tokens import DocBin

nlp = spacy.blank("en") # load a new spacy model
db = DocBin() # create a DocBin object

for text, annot in tqdm(TRAIN_DATA): # data in previous format
    doc = nlp.make_doc(text) # create doc object from text
    ents = []
    for start, end, label in annot["entities"]: # add character indexes
        span = doc.char_span(start, end, label=label)
        if span is None:
            pass
        else:
            ents.append(span)
    doc.ents = ents # label the text with the ents
    db.add(doc)

db.to_disk("./train.spacy") # save the docbin object

then it prints this error:

AttributeError: 'DocBin' object has no attribute 'to_disk'

Solution

  • Make sure you are really using spaCy 3, in case you haven't :)

    You can check this from the console by running python -c "import spacy; print(spacy.__version__)"

    By issuing via command line pip install spacy==3.0.6 in a python env, and then running in the python console

    import spacy
    from spacy.tokens import DocBin
    
    nlp = spacy.blank("en") # load a new spacy model
    db = DocBin() # create a DocBin object
    
    # omitting code for debugging purposes
    
    db.to_disk("./train.spacy") # save the docbin object
    

    you should get no errors.