Search code examples
pythonspacyspacy-3

How does spaCy use the Thinc ParserStepModel object in the pipeline


I'm trying to modify the softmax output from spaCy, but I'm not understanding how spaCy uses the Thinc predict function.

I had assumed that each time a Thinc model predict function is called as part of the spaCy pipeline it would return data in the same format. However, when I place a break point at 'preds' in the code below I can see that the data returned from self._func returns data in two formats:

  • A list of numpy arrays - an array I believe contains the softmax score for each models classification prediction.
  • A space.ml.parser_model.ParserStepModel object. I'm not sure how or why the model is returning data in this format.

I was hoping someone could explain why the Thinc model is returning a ParserStepModel object and how its used as part of the spaCy pipeline. Also if anyone knows how I can detect what the 'preds' data type is (I've unsuccessfully tried isinstance).

import spacy
from thinc.model import Model, InT, OutT
import numpy as np

def predict(self, X:InT) -> OutT:

    preds = self._func(self, X, is_train=False)[0]

    return preds

Model.predict = predict

nlp = spacy.load('en_core_web_sm')

def show_ents(doc):
    if doc.ents:
        for ent in doc.ents:
            print(ent.text + ' - ' + str(ent.start_char) + ' - ' + str(ent.end) + ' - ' +
                  ent.label_ + ' - ' + str(spacy.explain(ent.label_)))
    else:
        print('No named entities found.')

doc = nlp('Apple is looking at buying U.K. startup for $1 billion')

show_ents(doc)

Solution

  • This is happening because there are two Models in the spaCy pipeline you have. First the tok2vec runs and creates embeddings of each token, then those are used as features for the parser. See the pipeline docs.

    If you have trouble finding the type of anything it's probably a Cython type, and you'd need to check the Cython source in spaCy or Thinc. I'm not sure what a "press" is, how are you getting it? (Make a new question for that)