basically i hav text file as input to NLTK’s tokenize.regexp. how to input text file to below code:
'from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
raw = doc_a.lower() #instead of 'doc_a' i want my text file as input
tokens = tokenizer.tokenize(raw)`
Before this line:
raw = doc_a.lower() #instead of 'doc_a' i want my text file as input
add code to read doc_a
from your file, like this:
with open(r'path_to\my_text_file.txt', 'r') as input:
doc_a = input.read()
then continue with lowercasing and tokenizing.