Search code examples
pythonnltktokenize

How to tokenize a file?


I want to be able to analyze a local txt file I have using NLTK. By analyze, I mean use NLTK capabilities such as tokenizing, sentiment analysis etc.

I have a local file in my Python directory named 'example.txt'.

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize


with open ('example.txt', 'r') as f:
    for line in f:
        f_contents = f.readlines()
        print(word_tokenize(f_contents))

I am trying to print 'f_contents' in tokenized format. 'F_contents' in this case should be the text within 'example.txt'.

Any help would be appreciated.


Solution

  • The input to word_tokenize should be a string.

    But you're feeding the output of File.readlines() which is a list of string.

    And also when iterating through a file you are implicitly doing File.readlines().

    So simply:

    from nltk.tokenize import word_tokenize
    
    with open ('example.txt') as fin:
        for line in fin:
            print(word_tokenize(line))