Search code examples
pythondictionaryhashmaptrie

Design Add and Search Words Data Structure: Leetcode 211


I am currently trying to solve the problem Add and Search Words Data Structure on leetcode. The question is as follows:

Design a data structure that supports adding new words and finding if a string matches any previously added string.

Implement the WordDictionary class:

WordDictionary() Initializes the object.

void addWord(word) Adds word to the data structure, it can be matched later.

bool search(word) Returns true if there is any string in the data structure that matches word or false otherwise. word may contain dots . where dots can be matched with any letter.

My Strategy:

My strategy involves representing a trie with a hashmap instead of a traditional linked-list-based tree structure, aiming for better performance and lower complexity. By using a hashmap, we can quickly access the next node without traversing through unnecessary nodes, making operations faster especially with large datasets.

For example, when inserting words like apple and application into this structure, it's organized as nested hashmaps where each character in a word points to another hashmap representing the next character. The end of a word is marked with a special key-value pair {'end': {}}. This way, we efficiently store and search for words with minimal space and time complexity.

My Code:

class WordDictionary(object):

    def __init__(self):
        self.map = {}

    def addWord(self, word):
        """
        :type word: str
        :rtype: None
        """
        current = self.map
        for i in word:
            if i in current:
                current = current[i]
            else:
                current[i] = {}
                current = current[i]
        current['end'] = {}
        
        return
        

    def search(self, word):
        """
        :type word: str
        :rtype: bool
        """
        current = self.map
        for i in word:
            if i in current:
                current = current[i]
            elif i == '.':
                current = {key: value for d in current.values() for key, value in d.items()}
            else:
                return False
        if 'end' in current:
            return True
        return False

The solution seems to be effective for the majority of cases, but I've hit a error with test case 16, where it's not giving the right outcome. The length of test case 16 makes it particularly challenging to pinpoint where the mistake is happening. I'm in need of some guidance to track down and fix this logical error. Would you be able to lend a hand in sorting this out?


Solution

  • One thing searching in a trie shares with binary search is that it keeps shrinking the possibilities between a range; except instead of binary, it's character-by-character, and instead of dividing down the binary middle, it uses the position in the word. (It still performs with log n iid, see Tong2016Smoothed.)

    However, when it gets to a wildcard (.), one can't add to the possibilities and do them in parallel. In general, we will have holes in the output such that it is not represented in a single range. For example, when searching for c.t in {cat, cel, cut}, cat and cut are included but cel, in the middle, is not. I think I've modified this code to do this by taking multiple paths.

    class WordDictionary(object):
    
        def __init__(self):
            self.node = None
            self.map = {}
    
        def __repr__(self):
            if self.node and self.map:
                return "\"{0}\".{1}".format(self.node, self.map)
            elif self.node:
                return "\"{0}\"".format(self.node)
            elif self.map:
                return "∅.{0}".format(self.map)
            else:
                return "<error>"
    
        def addWord(self, word: str) -> None:
            current = self
            for i in word:
                if not i in current.map:
                    current.map[i] = WordDictionary()
                current = current.map[i]
            current.node = word
    
        def match(self, word: str) -> str:
            if not word:
                """
                in-order sub-trie traversal will enumerate the words that have
                `word` as a prefix (hence prefix-tree); (not used)
                """
                if self.node:
                    yield self.node
                return
            i = word[0]
            rest = word[1:]
            if i == '.':
                for wild in self.map.values():
                    yield from wild.match(rest)
            elif i in self.map:
                yield from self.map[i].match(rest)
    
        def search(self, word: str) -> bool:
            return next(self.match(word), None) != None
    
    words = WordDictionary()
    words.addWord("zulu")
    words.addWord("november")
    words.addWord("mike")
    words.addWord("yankee")
    words.addWord("cat")
    words.addWord("cate")
    words.addWord("cel")
    words.addWord("cut")
    words.addWord("cute")
    print("words: {0}".format(words))
    
    print("first match a:", next(words.match("a"), None))
    print("first match mi..:", next(words.match("mi.."), None))
    print("search a:", words.search("a"))
    print("search c.t:", words.search("c.t"))
    print("search z.lu:", words.search("z.lu"))
    print("search mi..:", words.search("mi.."))
    print("search ..:", words.search(".."))
    print("search ....:", words.search("...."))
    print("search c.te", words.search("c.te"))
    

    The match function is a (recursive) generator for all the words that could be possible using the string with wildcards. The self.node string is a more expressive form of the end sentinel; end could be used, but the way I've written search requires that any words are returned from match. (Sorry for the python3 instead of python, this is not my usual language at all.)