I am trying to lemmatize content in a df but the function I wrote isn't working. Prior to trying to lemmatize the data in the column looked like this.
Then I ran the following code:
import nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()
def lemmatize_text(text):
lemmatizer = WordNetLemmatizer()
return [lemmatizer.lemmatize(w) for w in text]
df['content'] = df["content"].apply(lemmatize_text)
print(df.content)
Now the content column looks like this:
I'm not sure what i did wrong, but I am just trying to lemmatize the data in the content column. Any help would be greatly appreciated.
You are lemmatizing each char instead of word. Your function should look like this instead:
def lemmatize_text(text):
lemmatizer = WordNetLemmatizer()
return ' '.join([lemmatizer.lemmatize(w) for w in text.split(' ')])