I'm interested in understanding the specific loss function used by the Space library for training models in the context of Named Entity Recognition. Is there a standard loss function recommended by Space for NER tasks? Are there any alternative loss functions recommended for specific NER scenarios? I would also like to know if the loss function is customizable and how it is implemented within the Space library. . . . . Thank you for providing such a detailed response. I really appreciate your help!
The answer to this is more complicated than you might expect, because spaCy uses a transition-based NER model with an imitation learning objective. The best description of the algorithm is this video, especially the structured prediction part: https://www.youtube.com/watch?v=sqDHBH9IjRU
The actual loss function used to decide between the different actions is also a bit tricky. The implementation is here: https://github.com/explosion/spaCy/blob/0367f864fe90dfa1dcdd0bfaf8f06dbcd5e97e45/spacy/syntax/_parser_model.pyx#L153
I'm sure I've described this in other comments but I can't immediately find it. Basically there may be several equally good transitions, and we want the objective function to account for that. The equations for this are described in Section 4 here: https://aclanthology.org/P05-1022.pdf