Having trouble figuring out how to lemmatize words from a txt file. I've gotten as far as listing the words, but I'm not sure how to lemmatize them after the fact.
Here's what I have:
import nltk, re
nltk.download('wordnet')
from nltk.stem.wordnet import WordNetLemmatizer
def lemfile():
f = open('1865-Lincoln.txt', 'r')
text = f.read().lower()
f.close()
text = re.sub('[^a-z\ \']+', " ", text)
words = list(text.split())
Initialise a WordNetLemmatizer
object, and lemmatize each word in your lines. You can perform inplace file I/O using the fileinput
module.
# https://stackoverflow.com/a/5463419/4909087
import fileinput
lemmatizer = WordNetLemmatizer()
for line in fileinput.input('1865-Lincoln.txt', inplace=True, backup='.bak'):
line = ' '.join(
[lemmatizer.lemmatize(w) for w in line.rstrip().split()]
)
# overwrites current `line` in file
print(line)
fileinput.input
redirects stdout to the open file when it is in use.