I was wondering, how I could use the pretrained transformer model en_trf_bertbaseuncased_lg
from spacy for future NLP tasks (NER, POS, etc.). The documentation states, that the module can only be used for the following pipeline preprocessing modules (https://spacy.io/models/en#en_trf_bertbaseuncased_lg):
Can anyone explain to me, what these components are doing and in which tasks they can be used? Or does anyone know a good sources to read about it?
>>> import spacy
>>> nlp = spacy.load("en_trf_bertbaseuncased_lg")
>>> nlp.pipe_names
[sentencizer, trf_wordpiecer, trf_tok2vec]
trf_wordpiecer component
doc._.trf_alignment
Quote from the docs:
Wordpiece is convenient for training neural networks, but it doesn't produce segmentations that match up to any linguistic notion of a "word". Most rare words will map to multiple wordpiece tokens, and occasionally the alignment will be many-to-many.
trf_tok2vec component
doc._.trf_last_hidden_state
doc.tensor
.See also this blog article introducing spacy's transformer integration.