How to use the pretrained transformer model ("en_trf_bertbaseuncased_lg") in SpaCy?

I was wondering, how I could use the pretrained transformer model en_trf_bertbaseuncased_lg from spacy for future NLP tasks (NER, POS, etc.). The documentation states, that the module can only be used for the following pipeline preprocessing modules (https://spacy.io/models/en#en_trf_bertbaseuncased_lg):

sentencizer
trf_wordpiecer
trf_tok2vec

Can anyone explain to me, what these components are doing and in which tasks they can be used? Or does anyone know a good sources to read about it?

>>> import spacy
>>> nlp = spacy.load("en_trf_bertbaseuncased_lg")
>>> nlp.pipe_names
[sentencizer, trf_wordpiecer, trf_tok2vec]

Solution

trf_wordpiecer component

accessible via doc._.trf_alignment
performs the model’s wordpiece pre-processing

Quote from the docs:

Wordpiece is convenient for training neural networks, but it doesn't produce segmentations that match up to any linguistic notion of a "word". Most rare words will map to multiple wordpiece tokens, and occasionally the alignment will be many-to-many.

trf_tok2vec component

accessible via doc._.trf_last_hidden_state
stores the raw outputs of the transformer: one tensor with one row per wordpiece token
however, what you probably want are the token-aligned features in doc.tensor.

See also this blog article introducing spacy's transformer integration.