Search code examples
pythonpython-3.xspacyspacy-3

Spacy cust entity training returns nothing


I have descriptions from which I want to extract colours. Hence I thought I would use NER by spacy. I have data like this for 8000 lines

import spacy
nlp=spacy.load('en_core_web_sm')

# Getting the pipeline component
ner=nlp.get_pipe("ner")

Train_data = 
[
("Bermuda shorts anthracite/black",{"entities" : [(15,31,"COL")]}),
("Acrylic antique white",{"entities" : [(8,22,"COL")]}),
("Pincer black",{"entities" : [(8,13,"COL")]}),
("Cable tie black",{"entities" : [(10,15,"COL")]}),
("Water pump pliers blue",{"entities" : [(18,22,"COL")]})
]

My code is

for _, annotations in Train_data:
    for ent in annotations.get("entities"):
        ner.add_label(ent[2])

pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]


from spacy.training.example import Example

for batch in spacy.util.minibatch(Train_data, size=2):
    for text, annotations in batch:
        # create Example
        doc = nlp.make_doc(text)
        example = Example.from_dict(doc, annotations)
        # Update the model
        nlp.update([example], losses=losses, drop=0.3)

WHen I test the model I get nothing.


doc = nlp("Bill Gates has a anthracite house worth 10 EUR.")
print("Entities", [(ent.text, ent.label_) for ent in doc.ents])

Why am I doing wrong? Please help...


Solution

  • There are several problems with your code.

    Where did you save your model?

    There is nothing in your code to indicate you saved and reloaded your model. When you train a model like that, you aren't modifying the existing model on disk. If you don't save the model after training it's just gone, which would mean you get no color annotations.

    Your input doesn't look like your training data!

    Your input is a complete sentence, but your training data is isolated phrases. This will result in poor performance, as the model isn't sure what to do with colors and, say, verbs. (You would probably still get some annotations though.)


    I strongly suggest you go through the spaCy course, which covers training your own NER model. I also strongly recommend you use the v3 config-based training instead of writing your own training loop.