Search code examples
pythonnlpsuffix-tree

Root Identification in a list of Data in python:


I am new to Python. I tried to come up with "Root Identification" from the list of data. But it doesn't work. Here is code that I have tried:

listData=["blackish", "blacken","blacked"]

The output I expect is:

root = [black] and suffixLi = ["ish", "en", "ed"] 

Rest of the code:

def root():
    i=0
    j=0
    string = ""
    for word in listData:
        for i in range(len(min(listData, key=len))-1):
            print(len(min(listData, key=len)))
            if (listData[i][j]==listData[i+1][j]):
                string=string+listData[i][j]
                print(listData[i][j])
                print(string)
            i=i+1
            j=j+1
    print(string)  

Solution

  • Presuming you want trying to find the common prefix:

    def root_pre(l):
        root = ""
        for t in zip(*l):
            if not all(t[0] == s for s in t):
                break
            root += t[0]
        ln = len(root)
        pres = [s[ln:] for s in listData]
        return root, pres
    
    print(root_pre(listData))
    ('black', ['ish', 'en', 'ed'])