Search code examples
pythondataframetextnltklemmatization

lemmatizing a verb list in a data frame in Python


I want to ask a seemingly simple question to Python wizs (I am a total newbie so have no idea how simple/complex this question is)!

I have a verb list in a dataframe looking as below:

id verb
15 believe
64 start
90 believe

I want to lemmatize it. The problem is that most lemmatization comes with sentence strings. My data does not provide context to decide its part-of-speech because I only need 'verb' speech lemmas.

Would you have any ideas about how to go about lemmatizing this verb list? Many thanks in advance for considering my question!


Solution

  • If you are asking how to apply a function over a pandas DataFrame column, you can do

    import pandas as pd
    from nltk.stem import WordNetLemmatizer
    
    
    data = pd.DataFrame({
        "id": [1, 2, 3, 4],
        "verb": ["believe", "start", "believed", "starting"],
    })
    # https://www.nltk.org/_modules/nltk/stem/wordnet.html
    wnl = WordNetLemmatizer()
    data.verb = data.verb.map(lambda word: wnl.lemmatize(word, pos="v"))
    
    print(data)
    

    Output

       id     verb
    0   1  believe
    1   2    start
    2   3  believe
    3   4    start