Search code examples
pythonnltkpos-tagger

NLTK tagger reading from txt


I use NLTK on python. I want to read from txt for using default, unigram and pos tagger. However I did not do it because there is not specific import tag for txt. For example in the class, we are using prepared corpus like brown or etc. My question is how can I do import method for using taggers. Eventually, I want to see evaluate performance for each tagger.


Solution

  • Read a file like this:

    f = open('your-file.txt', 'rU') # U is for Unicode
    raw = f.read()
    tokens = nltk.word_tokenize(raw)
    

    Once you have a tokenized text you can proceed in tagging it, for example:

    def_tagger = nltk.DefaultTagger('NN')
    def_tagger.tag(tokens)
    

    And this will (as an example) tag every token as NN. To evaluate it you'll need to manually assign a tag to each word and then:

    def_tagger.evaluate(you_manual_tagged_sents)
    

    This will return a number between 0 (very bad) and 1 (perfect match).