Search code examples
pythonnlpnltktext-miningstemming

Add a new stemmer to nltk


I have this python function that works as expected. Is it possible to save the logic as NLP stemmer? If yes, what changes needs to be done?

import itertools, re
def dropdup(mytuple):
    newtup=list()
    for i in mytuple:
        i = i[:-3] if i.endswith('bai') else i
        for r in (("tha", "ta"), ("i", "e")):
            i = i.replace(*r)
            i = re.sub(r'(\w)\1+',r'\1', i)
        newtup.append(''.join(i for i, _ in itertools.groupby(i)))
    return tuple(newtup)

dropdup(('savithabai', 'samiiir', 'aaaabaa'))
('saveta', 'samer', 'aba')

I will like the users to import something like this...

from nltk.stemmer import indianNameStemmer

There are a few more rules to be added to the logic. I just want to know if this is a valid (pythonic) idea.


Solution

  • First see https://www.python-course.eu/python3_inheritance.php

    Create a file mytools.py

    import itertools, re
    from nltk.stem import StemmerI
    
    class MyStemmer(StemmerI):
        def stem(self, token):
            itoken = token[:-3] if token.endswith('bai') else token
            for r in (("tha", "ta"), ("i", "e")):
                token = token.replace(*r)
                token = re.sub(r'(\w)\1+',r'\1', token)
            return ''.join(i for i, _ in itertools.groupby(token))
    

    Usage:

    >>> from mystemmer import MyStemmer
    >>> s = MyStemmer()
    >>> s.stem('savithabai')
    'savetabae'