Search code examples
pythonnlpstemminglemmatization

Stemming and lemming words


I have a text document i need to use stemming and Lemmatization on. I have already cleaned the data and tokenised it as well as removing stop words

what i need to do is take the list as an input and return a dict and the dict should have the keys 'original stem and lemmma. and the values being the nth word transformed in that way

  snowball stemmer is defined as Stemmer()
  and WordNetLemmatizer is defined as lemmatizer()

heres the code ive written but it does give our an error

def find_roots(token_list, n):
n = 2
original = tokens
stem = [ele for sub in original for idx, ele in 
enumerate(sub.split()) if idx == (n - 1)]
stem = stemmer(stem)
lemma = [ele for sub in original for idx, ele in 
enumerate(sub.split()) if idx == (n - 1)]
lemma = lemmatizer()
return 

Any help would be appreciated


Solution

  • I really don't understand what you are trying to do in the list comprehensions, so I'll just write how I would do it:

    from nltk import WordNetLemmatizer, SnowballStemmer
    
    lemmatizer = WordNetLemmatizer()
    stemmer = SnowballStemmer("english")
    
    
    def find_roots(token_list, n):
        token = token_list[n]
        stem = stemmer.stem(token)
        lemma = lemmatizer.lemmatize(token)
        return {"original": token, "stem": stem, "lemma": lemma}
    
    
    roots_dict = find_roots(["said", "talked", "walked"], n=2)
    print(roots_dict)
    > {'original': 'walked', 'stem': 'walk', 'lemma': 'walked'}