Search code examples
pythonpython-2.7nltktokenize

how to take input as text file in NLTK’s tokenize.regexp python


basically i hav text file as input to NLTK’s tokenize.regexp. how to input text file to below code:

'from nltk.tokenize import RegexpTokenizer

tokenizer = RegexpTokenizer(r'\w+')

raw = doc_a.lower() #instead of 'doc_a' i want my text file as input

tokens = tokenizer.tokenize(raw)`


Solution

  • Before this line:

    raw = doc_a.lower() #instead of 'doc_a' i want my text file as input
    

    add code to read doc_a from your file, like this:

    with open(r'path_to\my_text_file.txt', 'r') as input:
        doc_a = input.read()
    

    then continue with lowercasing and tokenizing.