Search code examples
nlpspacy

Label schemes by language in Spacy


From the Spacy documentation:

For a list of the fine-grained and coarse-grained part-of-speech tags assigned by spaCy’s models across different languages, see the label schemes documented in the models directory.

I assume this is referring to the parts of speech tags, eg: VERB, NOUN, NUM etc., and that this list will be different for each language.

Is this a correct assumption?

I followed the link in the documentation to the models directory, but could not find a list of the valid POS tags for each language.

https://spacy.io/usage/linguistic-features#pos-tagging

Answer

Thanks to @polm23 for the answer, here's a screen shot with the navigation, in case anyone else can't find it.

enter image description here


Solution

  • Look for the "label scheme" on the page for any individual language.

    label scheme screenshot

    The VERB NOUN type tags, that go in the .pos attribute, are from Universal Dependencies, and are mostly the same between languages. The coarse-grained tags, for the .tag attribute, can be anything and are unique to each language as far as I'm aware.