Search code examples
pythonalgorithmdata-structuresdepth-first-searchtrie

How do we compare two trie for similarity?


I was just curious if there is a way to compare two tries data structure for similarities?

trie1                      trie2

   root                     root 
/     |                   /   |
m     b                   m   b
|     |                   |   |
a     o                   a   o
| \   |                   |   |
t  x  b                   x   b

def compare_trie(trie1, trie2):
    pass

Output["max","bob"]

Edit: So far I tried to implement a dfs algorithm, but struck on how to manage two stacks of the different tries

Code that I tried still struck by managing two stacks for two different tries:

def compareTrie(trie1, trie2):
    dfsStack = []
    result = []
    stack1 = [x for x in trie1.keys()]
    stack2 = [y for y in trie2.keys()]
    similar = list(set(stack1) & set(stack2))
    dfsStack.append((similar, result))
    while (dfsStack):
        current, result = dfsStack.pop()
        print(current, result)
        result.append(current)
        for c in current:
            trie1 = trie1[c]
            trie2 = trie2[c]
            st1 = [x for x in trie1.keys()]
            st2 = [x for x in trie2.keys()]
            simm = list(set(st1) & set(st2))
            dfsStack.append((simm, result))

    print(result)

Trie Implementation:

def create_trie(words):
    trie = {}
    for word in words:
        curr = trie
        for c in word:
            if c not in curr:
                curr[c] = {}
            curr = curr[c]
        # Mark the end of a word
        curr['#'] = True
    return trie


s1 = "mat max bob"
s2 = "max bob"

words1 = s1.split()
words2 = s2.split()

t1 = create_trie(words1)
t2 = create_trie(words2)

Solution

  • Your idea to use dfs was correct; however, you could've opted a simple recusive approach to solve the task at hand. Here's the recursive version:

    def create_trie(words):
        trie = {}
        for word in words:
            curr = trie
            for c in word:
                if c not in curr:
                    curr[c] = {}
                curr = curr[c]
            # Mark the end of a word
            curr['#'] = True
        return trie
    
    def compare(trie1, trie2, curr):
        for i in trie1.keys():
            if trie2.get(i, None):
                if i=="#":
                    result.append(curr)
                else:
                    compare(trie1[i], trie2[i], curr+i)
        
    
    s1 = "mat max bob temp2 fg f r"
    s2 = "max bob temp fg r c"
    
    words1 = s1.split()
    words2 = s2.split()
    
    t1 = create_trie(words1)
    t2 = create_trie(words2)
    result = []
    compare(t1, t2, "")
    print(result)   #['max', 'bob', 'fg', 'r']