Search code examples
pythonnltkpos-tagger

Pos tagging german texts using NLTK


I want to use NLTK to POS tag german texts. I found some references on the web, but most of the are outdated. Some reference for example a "EUROPARL" thesaurus, but it looks like only "EUROPARL_raw" is still available. And that one is not POS tagged. I found also some references to usage of the TIGER corpus, but the latest version seems to be I format I cannot parse with NLTK out of the box.

I'm aware of some non-NTLT alternatives, but I would prefer to use NLTK. Could somebody provide a simple example with POS tagging based on a german corpus?


Solution

  • I was unable to find a tagged corpus to use with NLTK. If you require a pre-tagged corpus you may be out of luck with NLTK. There is an open issue ticket for this very issue, but there has been no progress (Reading Negra Corpus Files)

    You could tag your own corpus using the NLTK Trainer and the Negra Corpus. It would require knowledge of german grammar but no coding. See demonstration of the NLTK-Trainer.