Search code examples
nlptokenizespacy

How to set annotations to treat labels as nouns in spaCy library, Python


I have this labelled sentence:

[x] moved to [y] in [z].

How can I set annotations for [x], [y] as a noun, [z] as a datetime? I referred to https://spacy.io/usage/linguistic-features#native-tokenizer-additions but did not find the thing I wanted or I missed it.


Solution

  • You can set the POS with tokenizer special cases (https://spacy.io/usage/linguistic-features#special-cases):

    orth = "[z]"
    nlp.tokenizer.add_special_case(orth, [{"ORTH": orth, "TAG": "NUM"}])
    

    (It's honestly kind of weird to have the tokenizer setting tags, but this functionality is there for now.)