spacy loss-function named-entity-recognition

What loss function does Space use for Named Entity Recognition (NER)

I'm interested in understanding the specific loss function used by the Space library for training models in the context of Named Entity Recognition. Is there a standard loss function recommended by Space for NER tasks? Are there any alternative loss functions recommended for specific NER scenarios? I would also like to know if the loss function is customizable and how it is implemented within the Space library. . . . . Thank you for providing such a detailed response. I really appreciate your help!

Solution

The answer to this is more complicated than you might expect, because spaCy uses a transition-based NER model with an imitation learning objective. The best description of the algorithm is this video, especially the structured prediction part: https://www.youtube.com/watch?v=sqDHBH9IjRU

The actual loss function used to decide between the different actions is also a bit tricky. The implementation is here: https://github.com/explosion/spaCy/blob/0367f864fe90dfa1dcdd0bfaf8f06dbcd5e97e45/spacy/syntax/_parser_model.pyx#L153

I'm sure I've described this in other comments but I can't immediately find it. Basically there may be several equally good transitions, and we want the objective function to account for that. The equations for this are described in Section 4 here: https://aclanthology.org/P05-1022.pdf