I need to find all possible words that could be made using letters specified by the user. User can use "?" - as wildcards (up to 2 wildcards). Max input is 15 characters, including those wildcards. Example input: "abcdefghijklm??".
Currently i have 2_700_000 words stored in a Trie. I look it up like that:
def search_word(node, wildcards, alphabet, tiles, output)
output.push(node.full_state) if node.terminal? # append output if its a word
unless tiles.empty? && wildcards.zero?
tiles.uniq.each do |tile|
unless node.walk(tile).nil? # choose only those letters that could possibly make a word
next_node = node.walk(tile)
remaining_tiles = take_away(tiles, tile)
search_word(next_node, wildcards, alphabet, remaining_tiles, output)
end
end
end
if wildcards > 0
other_letters = take_away(alphabet, tiles.uniq) # letters that weren't used in previous loop
other_letters.each do |tile|
unless node.walk(tile).nil? # only those that could make a word
next_node = node.walk(tile)
remaining_tiles = take_away(tiles, tile)
remaining_wildcards = wildcards - 1
search_word(next_node, remaining_wildcards, alphabet, tiles, output)
end
end
end
end
What it does can be described as:
def searchword(trie, wildcards, tiles, output):
if trie is word:
output.append(trie.word) # or send current letters as arguments to this function
for each unique tile in tiles:
if trie has tile:
searchword(trie[tile], wildcards, (tiles - tile), output)
if wildcards > 0:
for each key in trie that has not already been searched in previous loop:
searchword(trie[key], (wildcards - 1), tiles, output)
Speed tests: 15 letter input, no wildcards: 0,45 s 15 letter input, one wildcard: 3,54 s 15 letter input, two wildcards: 15,59s
There are many scrabble solvers online, which do such tasks under 1 second.
QUESTION How to speed up the process, so i takes under 1 second each time? I'm considering: a) writing the search method in C b) reorganizing Trie so words are stored as their letters sorted alphabetically (ex. fast -> afst) - that would reduce number of searches I'm willing to listen if you knew any better approaches for this particular task. c) switching my Trie for a hash table?
You should go with option b, by letters in order store all anagrams. Not revisiting letters is a huge win.
Except do not sort the alphabetically. Sort the letters in this order: etaoinshrdlcumwfgypbvkjxqz
. That is most frequent to least frequent. The idea is to only look at common letters a minimal number of times, rather than over and over again.