Search code examples
pythonpyenchant

Efficient hunting for words in scrambled letters


I guess you could classify this as a Scrabble style problem, but it started out due to a friend mentioning the UK TV quiz show Countdown. Various rounds in the show involve the contestants being presented a scrambled set of letters and they have to come up with the longest word they can. The one my friend mentioned was "RAEPKWAEN".

In fairly short order I whipped up something in Python to handle this problem, using PyEnchant to handle the dictionary look-ups, however I'm noticing that it really can't scale all that well.

Here's what I have currently:

#!/usr/bin/python

from itertools import permutations
import enchant
from sys import argv

def find_longest(origin):
    s = enchant.Dict("en_US")
    for i in range(len(origin),0,-1):
        print "Checking against words of length %d" % i
        pool = permutations(origin,i)
        for comb in pool:
            word = ''.join(comb)
            if s.check(word):
                return word
    return ""

if (__name__)== '__main__':
    result = find_longest(argv[1])
    print result

That's fine on a 9 letter example like they use in the show, 9 factorial = 362,880 and 8 factorial = 40,320. On that scale even if it would have to check all possible permutations and word lengths it's not that many.

However once you reach 14 characters that's 87,178,291,200 possibly combinations, meaning you're reliant on luck that a 14 character word is quickly found.

With the example word above it's taking my machine about 12 1/2 seconds to find "reawaken". With 14 character scrambled words we could be talking on the scale of 23 days just to check all possible 14 character permutations.

Is there any more efficient way to handle this?


Solution

  • Implementation of Jeroen Coupé idea from his answer with letters count:

    from collections import defaultdict, Counter
    
    
    def find_longest(origin, known_words):
        return iter_longest(origin, known_words).next()
    
    def iter_longest(origin, known_words, min_length=1):
        origin_map = Counter(origin)
        for i in xrange(len(origin) + 1, min_length - 1, -1):
            for word in known_words[i]:
                if check_same_letters(origin_map, word):
                   yield word
    
    def check_same_letters(origin_map, word):
        new_map = Counter(word)
        return all(new_map[let] <= origin_map[let] for let in word)
    
    def load_words_from(file_path):
        known_words =  defaultdict(list)
        with open(file_path) as f:
            for line in f:
                word = line.strip()
                known_words[len(word)].append(word)
        return known_words
    
    if __name__ == '__main__':
        known_words = load_words_from('words_list.txt')
        origin = 'raepkwaen'
        big_origin = 'raepkwaenaqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnm'
        print find_longest(big_origin, known_words)
        print list(iter_longest(origin, known_words, 5))
    

    Output (for my small 58000 words dict):

    counterrevolutionaries
    ['reawaken', 'awaken', 'enwrap', 'weaken', 'weaker', 'apnea', 'arena', 'awake',
     'aware', 'newer', 'paean', 'parka', 'pekan', 'prank', 'prawn', 'preen', 'renew',
     'waken', 'wreak']
    

    Notes:

    • It's simple implementation without optimizations.

    • words_list.txt - can be /usr/share/dict/words on Linux.

    UPDATE

    In case we need to find word only once, and we have dictionary with words sorted by length, e.g. by this script:

    with open('words_list.txt') as f:
        words = f.readlines()
    with open('words_by_len.txt', 'w') as f:
        for word in sorted(words, key=lambda w: len(w), reverse=True):
            f.write(word)
    

    We can find longest word without loading full dict to memory:

    from collections import Counter
    import sys
    
    
    def check_same_letters(origin_map, word):
        new_map = Counter(word)
        return all(new_map[let] <= origin_map[let] for let in word)
    
    def iter_longest_from_file(origin, file_path, min_length=1):
        origin_map = Counter(origin)
        origin_len = len(origin)
        with open(file_path) as f:
            for line in f:
                word = line.strip()
                if len(word) > origin_len:
                    continue
                if len(word) < min_length:
                    return
                if check_same_letters(origin_map, word):
                    yield word
    
    def find_longest_from_file(origin, file_path):
        return iter_longest_from_file(origin, file_path).next()
    
    if __name__ == '__main__':
        origin = sys.argv[1] if len(sys.argv) > 1 else 'abcdefghijklmnopqrstuvwxyz'
        print find_longest_from_file(origin, 'words_by_len.txt')