I am trying to implement a little game with the following rules: Given a set of random letters (e.g. 10), I want to find all possible words one can form out of these letters. I am using a standard dictionary for this.
Letters are allowed to be used multiple times, and not all letters have to be used, as long it results in words with 4 characters or more. I think this is similar to solving anagrams, except that letters are allowed to be used multiple times.
E.g. Letters given: q r b d t e s Possible words: bedders, dessert, etc.
In search of a data structure supporting O(1) for checking if a proposed word is in the dictionary, I found this paper and subsequently also found a working Java DAWG implementation here.
Here's where I am stuck: When trying to generate a list of possible words one can create from these letters, I am wondering if I am missing something using the DAWG implementation. I have seen other threads on here that are suggestion using a Trie and recursively go through the nodes, but I was hoping I could do something similar with the DAWG.
The implementation I currently have working is by going through all the words of the dictionary, then skip any word that contains a letter that is NOT part of the letters earlier assigned. This works, but is slow, going through all the words of the dictionary * ~20 letters worst-case.
for(String word : dawg.getAllStrings()) {
boolean blacklist = false;
for(int i = 0; i<nonLetters.length(); i++) {
if(word.indexOf(nonLetters.charAt(i)) >= 0) {
blacklist = true;
break;
}
}
if(!blacklist)
possibleWordList.add(word);
}
I tried implementing a recursive word match but struggled as the implementation doesn't expose access to its Node implementation, I could locally change this though.
Without access to the node, I can use dawg.contains(letter), but with the requirement to use letters multiple times I am wondering if this would really help. E.g. I would start with 'A', then 'AA', ... stopping at e.g. 20 A's.
Would all this just be easier with a Trie? Still fairly fast to find matching words, but easier to generate a list of possible words?
Are there other DAWG libraries that expose Node traversing or have a ref implementation to generate all possible words?
Thanks for any help or pointers!
I think this is a good way. I added a recursive method to ModifiableDAWGSet of the DAWG implementation referenced in the question.
getWords is called with a String prefix, adding up the characters. I first implemented this to generate all words in the DAWG and compared to the existing method of the ModifiableDAWGSet.getAllStrings(). Then I added to skip words that didn't contain words, not including Characters from the list of possible characters.
private ArrayList<String> allMatchingWords = new ArrayList<String>();
private ArrayList<Character> possibleCharacters;
private void getWords(ModifiableDAWGNode originNode, String prefix) {
NavigableMap<Character, ModifiableDAWGNode> transitionTreeMap = originNode.getOutgoingTransitions();
for(Map.Entry<Character, ModifiableDAWGNode> transition : transitionTreeMap.entrySet())
{
Character c = transition.getKey();
if(!possibleCharacters.contains(c))
continue;
ModifiableDAWGNode n = transition.getValue();
if(n.isAcceptNode()) //this is a word
{
allMatchingWords.add(prefix + c);
}
getWords(n, prefix + c);
}
}
public ArrayList<String> getAllMatchingWords(Character mustContain, ArrayList<Character> possibleCharacters)
{
allMatchingWords.clear();
this.mustContain = mustContain;
this.possibleCharacters = possibleCharacters;
getWords(sourceNode, "");
return allMatchingWords;
}