Search code examples
pythonnlpnltk

POS tagging in German


I am using NLTK to extract nouns from a text-string starting with the following command:

tagged_text = nltk.pos_tag(nltk.Text(nltk.word_tokenize(some_string)))

It works fine in English. Is there an easy way to make it work for German as well?

(I have no experience with natural language programming, but I managed to use the python nltk library which is great so far.)


Solution

  • Natural language software does its magic by leveraging corpora and the statistics they provide. You'll need to tell nltk about some German corpus to help it tokenize German correctly. I believe the EUROPARL corpus might help get you going.

    See nltk.corpus.europarl_raw and this answer for example configuration.

    Also, consider tagging this question with "nlp".