Search code examples
tagsnlpopennlp

What tag set is used in OpenNLP's german maxent model?


currently I am using the OpenNLP tools to PoS-tag german sentences, with the maxent model listed on their download-site:

de      POS Tagger      Maxent model trained on tiger corpus.   de-pos-maxent.bin

This works very well and I got results as:

Diese, Community, bietet, Teilnehmern, der, Veranstaltungen, die, Möglichkeit ...
PDAT, FM, VVFIN, NN, ART, NN, ART, NN ...

With the tagged sentences I want to do some further processing where I have to know the meaning of the single tags. Unforunately searching the OpenNLP-Wiki for the tag sets isn't very helpful as it says:

TODO: Add more tag sets, also for non-english languages

Does anyone know where can I find the tag set used in the german maxent model?


Solution

  • It seems very likely that the STTS tag set is used. This tag set is said to be the most common tag set for the German language, e.g. in this question or in this Wikipedia entry.