How does spacy split "'s"?

Spacy gives name's as 2 tokens -> name, 's. How could I combine those two tokens? Which rule define the splitting of "'s", infix, or others?

Solution

For spacy v2.2.3+, you can use nlp.tokenizer.explain() to see which tokenizer settings lead to particular tokens:

import spacy
nlp = spacy.blank("en")

nlp.tokenizer.explain("name's")
# [('TOKEN', 'name'), ('SUFFIX', "'s")]

For English, variants of 's are matched by the suffix_search setting. You can modify the suffix regex in order to modify this for the tokenizer: https://spacy.io/usage/linguistic-features#native-tokenizer-additions