Search code examples
pythonpandasdataframelemmatization

How to apply Lemmatization to a column in a pandas dataframe


If i had the following dataframe:

import pandas as pd

d = {'col1': ['challenging', 'swimming'], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

Output
          col1  col2
0  challenging     3
1     swimming     4

I am using the WordNetLemmatizer:

print(wordnet_lemmatizer.lemmatize('challenging',pos='v'))
print(wordnet_lemmatizer.lemmatize('swimming',pos='v'))

Output
challenge
swim

How can I apply this lemmatization function to all elements of col1 from the original dataframe?

I have tried the following but no luck since it requires an input of pos so no change to dataframe

df['col1'] =df['col1'].apply(wordnet_lemmatizer.lemmatize)

If i try:

df['col1'] =df['col1'].apply(wordnet_lemmatizer.lemmatize(pos='v'))

I get

TypeError: lemmatize() missing 1 required positional argument: 'word'

The desired output is:

        col1  col2
0       challenge     3
1       swim     4

Solution

  • Use the lambda function inside the apply to pass the word argument.

    df['col1'] = df['col1'].apply(lambda word: wordnet_lemmatizer.lemmatize(word, pos='v'))
    print(df)
    
            col1  col2
    0  challenge     3
    1       swim     4