Search code examples
puzzlemorse-code

Translating Morse code with no spaces


I have some Morse code that has lost the spaces in between the letters, my challenge is to find out what the message says. So far I have been kinda lost because of the sheer amount of combinations there might be.

Here is all the info on the messages I have.

  • The output will be English
  • There will always be a translation that make sense
  • Here is and example message -..-...-...-...-..-.-.-.-.-..-.-.-.-.-.-.-.-.-.-..-...-.
  • The messages should be no longer then 70 characters
  • The morse code was taken from a longer stream so it is possible that the first or last groups may be cut off and hence have no valid translation

Does anyone have a clever solution?


Solution

  • This is not an easy problem, because as ruakh suggested there are many viable sentences to a given message. For example 'JACK AND JILL WENT UP THE HILL' has the same encoding as 'JACK AND JILL WALK CHISELED'. Since these are both grammatical sentences and the words in each are common, it's not obvious how to pick one or the other (or any other of the 40141055989476564163599 different sequences of English words that have the same encoding as this message) without delving into natural language processing.

    Anyway, here's a dynamic programming solution to the problem of finding the shortest sentence (with the fewest characters if there's a tie). It can also count the total number of sentences that have the same encoding as the given message. It needs a dictionary of English words in a file.

    The next enhancements should be a better measure of how likely a sentence is: perhaps word frequencies, false-positive rates in morse (eg, "I" is a common word, but it appears often as part of other sequences of morse code sequences). The tricky part will be formulating a good score function that can be expressed in a way that it can be computed using dynamic programming.

    MORSE = dict(zip('ABCDEFGHIJKLMNOPQRSTUVWXYZ', [
        '.-', '-...', '-.-.', '-..', '.', '..-.', '--.', '....',
        '..', '.---', '-.-', '.-..', '--', '-.', '---', '.--.',
        '--.-', '.-.', '...', '-', '..-', '...-', '.--', '-..-',
        '-.--', '--..'
    ]))
    
    # Read a file containing A-Z only English words, one per line.
    WORDS = set(word.strip().upper() for word in open('dict.en').readlines())
    # A set of all possible prefixes of English words.
    PREFIXES = set(word[:j+1] for word in WORDS for j in xrange(len(word)))
    
    def translate(msg, c_sep=' ', w_sep=' / '):
        """Turn a message (all-caps space-separated words) into morse code."""
        return w_sep.join(c_sep.join(MORSE[c] for c in word)
                          for word in msg.split(' '))
    
    def encode(msg):
        """Turn a message into timing-less morse code."""
        return translate(msg, '', '')
    
    def c_trans(morse):
        """Construct a map of char transitions.
    
        The return value is a dict, mapping indexes into the morse code stream
        to a dict of possible characters at that location to where they would go
        in the stream. Transitions that lead to dead-ends are omitted.
        """
        result = [{} for i in xrange(len(morse))]
        for i_ in xrange(len(morse)):
            i = len(morse) - i_ - 1
            for c, m in MORSE.iteritems():
                if i + len(m) < len(morse) and not result[i + len(m)]:
                    continue
                if morse[i:i+len(m)] != m: continue
                result[i][c] = i + len(m)
        return result
    
    def find_words(ctr, i, prefix=''):
        """Find all legal words starting from position i.
    
        We generate all possible words starting from position i in the
        morse code stream, assuming we already have the given prefix.
        ctr is a char transition dict, as produced by c_trans.
        """
        if prefix in WORDS:
            yield prefix, i
        if i == len(ctr): return
        for c, j in ctr[i].iteritems():
            if prefix + c in PREFIXES:
                for w, j2 in find_words(ctr, j, prefix + c):
                    yield w, j2
    
    def w_trans(ctr):
        """Like c_trans, but produce a word transition map."""
        result = [{} for i in xrange(len(ctr))]
        for i_ in xrange(len(ctr)):
            i = len(ctr) - i_ - 1
            for w, j in find_words(ctr, i):
                if j < len(result) and not result[j]:
                    continue
                result[i][w] = j
        return result
    
    def shortest_sentence(wt):
        """Given a word transition map, find the shortest possible sentence.
    
        We find the sentence that uses the entire morse code stream, and has
        the fewest number of words. If there are multiple sentences that
        satisfy this, we return the one that uses the smallest number of
        characters.
        """
        result = [-1 for _ in xrange(len(wt))] + [0]
        words = [None] * len(wt)
        for i_ in xrange(len(wt)):
            i = len(wt) - i_ - 1
            for w, j in wt[i].iteritems():
                if result[j] == -1: continue
                if result[i] == -1 or result[j] + 1 + len(w) / 30.0 < result[i]:
                    result[i] = result[j] + 1 + len(w) / 30.0
                    words[i] = w
        i = 0
        result = []
        while i < len(wt):
            result.append(words[i])
            i = wt[i][words[i]]
        return result
    
    def sentence_count(wt):
        result = [0] * len(wt) + [1]
        for i_ in xrange(len(wt)):
            i = len(wt) - i_ - 1
            for j in wt[i].itervalues():
                result[i] += result[j]
        return result[0]
    
    msg = 'JACK AND JILL WENT UP THE HILL'
    print sentence_count(w_trans(c_trans(encode(msg))))
    print shortest_sentence(w_trans(c_trans(encode(msg))))