I am solving the following problem in unix
Suppose you are playing Scrabble. You have the following seven letters in your rack – E A F N A S M. These are letters you can use to make a word, you can use any number of them in your word, but must use at least one. You are trying to place a word on a spot on the board where there is already a word: ARE.
Your goal is to find a word that will attach to the word ARE with the letters you have in your rack. While usually your letters may be placed before or after the ARE to make a new word, in this case ARE is at the edge of the board, so your word must end in ARE. Your goal is to find all possible words that meet these criteria in the /usr/dict/words using grep.
The command I came up with is really inefficient but works.
grep “^[eafnasm][eafnasm]*are$” /usr/dict/words |
grep -v “a.*a.*a.*a” |
grep -v “e.*e.*e” |
grep -v “f.*f” |
grep -v “n.*n” |
grep -v “s.*s” |
grep -v “m.*m” |
grep -v “^...........”
Is there a more efficient way of doing this?
One way to speed things up would be:
grep -E '^[aefmns]{1,7}are$' /usr/dict/words |
grep -Ev 'a.*a.*a.*a|e.*e.*e|f.*f|n.*n|s.*s|m.*m'
It cuts down the number of processes looking at the data. I removed the second A from the initial character class since it is redundant, but the repetition represents a negligible cost. Using the {1,7}
qualifier in the first pattern means there's no need to filter overlong names in the second.
Note that the first search doesn't allow multiple R's through. It's a specialization for this specific combination of letters-in-hand plus word-on-board. If the hand held an R (instead of, say, the second A), then it would be necessary to filter more than 2 R's out of the results (two because under this scenario, there is one R in hand and one in the word on the board), and the multiple-A filter would have to change too.
Note that the changes here are just minor tweaks to the original 8 grep
commands being run. Since the solution is required to use grep
(ruling out Perl, Python, Awk, ...), you probably can't get to fewer than two commands, one 'positive' grep
to select possibilities, and one 'negative' grep
to eliminate impossibilities. With custom tooling (specialized programs written in C or C++ or something similar), you could probably do better.
If your version of grep
supports PCRE (Perl-compatible regular expressions), you might be able to do it 'all in one'. I'm fairly sure it would be less readable and comprehensible, and while it might perform a little better (less I/O because there's no pipe), the performance improvement would have to be measured. Sometimes, simpler is better.