Search code examples
algorithmdata-structuresstringsuffix-tree

Successive adding of char to get the longest word in the dictionary


Given a dictionary of words and an initial character. find the longest possible word in the dictionary by successively adding a character to the word. At any given instance the word should be valid word in the dictionary.

ex : a -> at -> cat -> cart -> chart ....


Solution

  • The brute force approach would be to try adding letters to each available index using a depth-first search.

    So, starting with 'a', there are two places you can add a new letter. In front or behind the 'a', represented by dots below.

    .a.

    If you add a 't', there are now three positions.

    .a.t.

    You can try adding all 26 letters to each available position. The dictionary in this case can be a simple hashtable. If you add a 'z' in the middle, you get 'azt' which would not be in the hashtable so you don't continue down that path in the search.

    Edit: Nick Johnson's graph made me curious what a graph of all maximal paths would look like. It's a large (1.6 MB) image here:

    http://www.michaelfogleman.com/static/images/word_graph.png

    Edit: Here's a Python implementation. The brute-force approach actually runs in a reasonable amount of time (a few seconds, depending on the starting letter).

    import heapq
    
    letters = 'abcdefghijklmnopqrstuvwxyz'
    
    def search(words, word, path):
        path.append(word)
        yield tuple(path)
        for i in xrange(len(word)+1):
            before, after = word[:i], word[i:]
            for c in letters:
                new_word = '%s%s%s' % (before, c, after)
                if new_word in words:
                    for new_path in search(words, new_word, path):
                        yield new_path
        path.pop()
    
    def load(path):
        result = set()
        with open(path, 'r') as f:
            for line in f:
                word = line.lower().strip()
                result.add(word)
        return result
    
    def find_top(paths, n):
        gen = ((len(x), x) for x in paths)
        return heapq.nlargest(n, gen)
    
    if __name__ == '__main__':
        words = load('TWL06.txt')
        gen = search(words, 'b', [])
        top = find_top(gen, 10)
        for path in top:
            print path
    

    Of course, there will be a lot of ties in the answer. This will print the top N results, measured by length of the final word.

    Output for starting letter 'a', using the TWL06 Scrabble dictionary.

    (10, ('a', 'ta', 'tap', 'tape', 'taped', 'tamped', 'stamped', 'stampede', 'stampedes', 'stampeders'))
    (10, ('a', 'ta', 'tap', 'tape', 'taped', 'tamped', 'stamped', 'stampede', 'stampeder', 'stampeders'))
    (10, ('a', 'ta', 'tan', 'tang', 'stang', 'strang', 'strange', 'strangle', 'strangles', 'stranglers'))
    (10, ('a', 'ta', 'tan', 'tang', 'stang', 'strang', 'strange', 'strangle', 'strangler', 'stranglers'))
    (10, ('a', 'ta', 'tan', 'tang', 'stang', 'strang', 'strange', 'stranges', 'strangles', 'stranglers'))
    (10, ('a', 'ta', 'tan', 'tang', 'stang', 'strang', 'strange', 'stranges', 'strangers', 'stranglers'))
    (10, ('a', 'ta', 'tan', 'tang', 'stang', 'strang', 'strange', 'stranges', 'strangers', 'estrangers'))
    (10, ('a', 'ta', 'tan', 'tang', 'stang', 'strang', 'strange', 'stranges', 'estranges', 'estrangers'))
    (10, ('a', 'ta', 'tan', 'tang', 'stang', 'strang', 'strange', 'stranger', 'strangler', 'stranglers'))
    (10, ('a', 'ta', 'tan', 'tang', 'stang', 'strang', 'strange', 'stranger', 'strangers', 'stranglers'))
    

    And here are the results for each starting letter. Of course, an exception is made that the single starting letter doesn't have to be in the dictionary. Just some 2-letter word that can be formed with it.

    (10, ('a', 'ta', 'tap', 'tape', 'taped', 'tamped', 'stamped', 'stampede', 'stampedes', 'stampeders'))
    (9, ('b', 'bo', 'bos', 'bods', 'bodes', 'bodies', 'boodies', 'bloodies', 'bloodiest'))
    (1, ('c',))
    (10, ('d', 'od', 'cod', 'coed', 'coped', 'comped', 'compted', 'competed', 'completed', 'complected'))
    (10, ('e', 're', 'rue', 'ruse', 'ruses', 'rouses', 'arouses', 'carouses', 'carousels', 'carrousels'))
    (9, ('f', 'fe', 'foe', 'fore', 'forge', 'forges', 'forgoes', 'forgoers', 'foregoers'))
    (10, ('g', 'ag', 'tag', 'tang', 'stang', 'strang', 'strange', 'strangle', 'strangles', 'stranglers'))
    (9, ('h', 'sh', 'she', 'shes', 'ashes', 'sashes', 'slashes', 'splashes', 'splashers'))
    (11, ('i', 'pi', 'pin', 'ping', 'oping', 'coping', 'comping', 'compting', 'competing', 'completing', 'complecting'))
    (7, ('j', 'jo', 'joy', 'joky', 'jokey', 'jockey', 'jockeys'))
    (9, ('k', 'ki', 'kin', 'akin', 'takin', 'takins', 'takings', 'talkings', 'stalkings'))
    (10, ('l', 'la', 'las', 'lass', 'lassi', 'lassis', 'lassies', 'glassies', 'glassines', 'glassiness'))
    (10, ('m', 'ma', 'mas', 'mars', 'maras', 'madras', 'madrasa', 'madrassa', 'madrassas', 'madrassahs'))
    (11, ('n', 'in', 'pin', 'ping', 'oping', 'coping', 'comping', 'compting', 'competing', 'completing', 'complecting'))
    (10, ('o', 'os', 'ose', 'rose', 'rouse', 'rouses', 'arouses', 'carouses', 'carousels', 'carrousels'))
    (11, ('p', 'pi', 'pin', 'ping', 'oping', 'coping', 'comping', 'compting', 'competing', 'completing', 'complecting'))
    (3, ('q', 'qi', 'qis'))
    (10, ('r', 're', 'rue', 'ruse', 'ruses', 'rouses', 'arouses', 'carouses', 'carousels', 'carrousels'))
    (10, ('s', 'us', 'use', 'uses', 'ruses', 'rouses', 'arouses', 'carouses', 'carousels', 'carrousels'))
    (10, ('t', 'ti', 'tin', 'ting', 'sting', 'sating', 'stating', 'estating', 'restating', 'restarting'))
    (10, ('u', 'us', 'use', 'uses', 'ruses', 'rouses', 'arouses', 'carouses', 'carousels', 'carrousels'))
    (1, ('v',))
    (9, ('w', 'we', 'wae', 'wake', 'wakes', 'wackes', 'wackest', 'wackiest', 'whackiest'))
    (8, ('x', 'ax', 'max', 'maxi', 'maxim', 'maxima', 'maximal', 'maximals'))
    (8, ('y', 'ye', 'tye', 'stye', 'styed', 'stayed', 'strayed', 'estrayed'))
    (8, ('z', 'za', 'zoa', 'zona', 'zonae', 'zonate', 'zonated', 'ozonated'))