Search code examples
rubyalgorithmsearchtrie

Speeding up word search in a Trie


I need to find all possible words that could be made using letters specified by the user. User can use "?" - as wildcards (up to 2 wildcards). Max input is 15 characters, including those wildcards. Example input: "abcdefghijklm??".

Currently i have 2_700_000 words stored in a Trie. I look it up like that:

def search_word(node, wildcards, alphabet, tiles, output) 

  output.push(node.full_state) if node.terminal? # append output if its a word
  unless tiles.empty? && wildcards.zero?
    tiles.uniq.each do |tile|
      unless node.walk(tile).nil?  # choose only those letters that could possibly make a word
        next_node = node.walk(tile)
        remaining_tiles = take_away(tiles, tile)
        search_word(next_node, wildcards, alphabet, remaining_tiles, output) 
      end
    end
  end

  if wildcards > 0
    other_letters = take_away(alphabet, tiles.uniq)  # letters that weren't used in previous loop
    other_letters.each do |tile|
      unless node.walk(tile).nil? # only those that could make a word
        next_node = node.walk(tile)
        remaining_tiles = take_away(tiles, tile)
        remaining_wildcards = wildcards - 1
        search_word(next_node, remaining_wildcards, alphabet, tiles, output) 
      end
    end
  end

end

What it does can be described as:

def searchword(trie, wildcards, tiles, output):
    if trie is word:
        output.append(trie.word) # or send current letters as arguments to this function

    for each unique tile in tiles:
        if trie has tile:
            searchword(trie[tile], wildcards, (tiles - tile), output)

    if wildcards > 0:
        for each key in trie that has not already been searched in previous loop:
            searchword(trie[key], (wildcards - 1), tiles, output)

Speed tests: 15 letter input, no wildcards: 0,45 s 15 letter input, one wildcard: 3,54 s 15 letter input, two wildcards: 15,59s

There are many scrabble solvers online, which do such tasks under 1 second.

QUESTION How to speed up the process, so i takes under 1 second each time? I'm considering: a) writing the search method in C b) reorganizing Trie so words are stored as their letters sorted alphabetically (ex. fast -> afst) - that would reduce number of searches I'm willing to listen if you knew any better approaches for this particular task. c) switching my Trie for a hash table?


Solution

  • You should go with option b, by letters in order store all anagrams. Not revisiting letters is a huge win.

    Except do not sort the alphabetically. Sort the letters in this order: etaoinshrdlcumwfgypbvkjxqz. That is most frequent to least frequent. The idea is to only look at common letters a minimal number of times, rather than over and over again.