Search code examples
python-3.xnltkpos-taggerfrench

How to POS_TAG a french sentence?


I'm looking for a way to pos_tag a French sentence like the following code is used for English sentences:

def pos_tagging(sentence):
    var = sentence
    exampleArray = [var]
    for item in exampleArray:
        tokenized = nltk.word_tokenize(item)
        tagged = nltk.pos_tag(tokenized)
        return tagged

Solution

  • The NLTK doesn't come with pre-built resources for French. I recommend using the Stanford tagger, which comes with a trained French model. This code shows how you might set up the nltk for use with Stanford's French POS tagger. Note that the code is outdated (and for Python 2), but you could use it as a starting point.

    Alternately, the NLTK makes it very easy to train your own POS tagger on a tagged corpus, and save it for later use. If you have access to a (sufficiently large) French corpus, you can follow the instructions in the nltk book and simply use your corpus in place of the Brown corpus. You're unlikely to match the performance of the Stanford tagger (unless you can train a tagger for your specific domain), but you won't have to install anything.