I read csv file into pandas dataframe.
my text column is df['story'].
how do I lemmatize this colummn ?
should I tokenize before?
No, you don't necessarily have to tokenize before lemmatizing. You can try the following code:
import stanza
import pandas as pd
nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos,lemma')
def lemmatize_text(text):
doc = nlp(text)
lemmas = [word.lemma for sent in doc.sentences for word in sent.words]
return ' '.join(lemmas)
df['lemmatized_story'] = df['story'].apply(lemmatize_text)