Search code examples
algorithmprobabilitygreedygame-theory

Optimal Algorithm for Winning Hangman


In the game Hangman, is it the case that a greedy letter-frequency algorithm is equivalent to a best-chance-of-winning algorithm?

Is there ever a case where it's worth sacrificing preservation of your remaining lives, for the sake of a better chance of guessing the correct answer?

Further clarification of the problem:

  • The selected word to be guessed has been taken from a known dictionary.
  • You are given N lives, and thus have to maximise the probability of guessing all the letters in the word without making N mistakes (i.e. you may have an unlimited number of correct guesses).
  • Each word in the dictionary has equal probability, for the sake of this exercise (i.e. the word is chosen randomly)
    • a harder exercise would be to come up with a strategy against a malicious, omniscient word-chooser (I am not asking that here)

Motivation: This question is inspired by the interesting discussion at http://www.datagenetics.com/blog/april12012/index.html

They suggest an algorithm for solving the word game "Hangman" optimally.

Their strategy can be summarised thus (edited for clarification):

  • We can assume the word is drawn from a particular dictionary
  • We know the number of letters in the word
  • Eliminate all words in the dictionary that do not have the correct number of letters.
  • Choose the not-yet-guessed letter which occurs in the largest number of words in the remaining subset of the dictionary.
  • If this letter occurs, we know its location.
  • If this letter does not occur, we know it does not occur in the word.
  • Eliminate all words in the dictionary subset that do not fit exactly this correct pattern, and repeat.
  • If there are 2 (or more) letters appearing equally often, the algorithm may perform a deeper analysis of the positions to determine which one is preferred (if that is reasonable?)

At each stage, we are guessing the letter (not previously guessed) which occurs in the largest number of remaining possible words.

There is some motivation to like this algorithm - we are always minimally likely to lose a life.

But, it strikes me that this isn't necessarily the best solution: if we're trying to guess the word (within a certain number of lives), it's not necessarily always the case that the most frequently occurring letter is the most useful letter to distinguish between the remaining available words.

I'm not sure, though, as it does seem opportune to avoid losing a life wherever possible. Will it ever be the case that optimal strategy permits us to sacrifice a life for a better opportunity to win?

Question: is it the case that this greedy algorithm is equivalent to a best-chance-of-winning algorithm, or not? And can you prove it?

An example dictionary+game would be ideal to show a disproof.


Solution

  • Assume the following dictionary: ABC ABD AEF EGH. (I'll capitalize unguessed letters.)
    Assume you have only 1 life (makes the proof so much easier...).

    The algorithm proposed above:

    Starting with A, you lose (1/4) or are left with aBC aBD aEF (3/4).
    Now guess B and lose (1/3) or are left with abC abD (2/3).
    Now guess C or D and you lose (1/2) or win (1/2).
    Probability to win: 3/4 * 2/3 * 1/2 = 1/4.

    Now try something else:

    Starting with E, you lose (1/2) or are left with AeF eGH (1/2).
    Now you know what to guess and win.
    Probability to win: 1/2.

    Clearly the proposed algorithm is not optimal.