Search code examples
pythonrot13

Determine ROT encoding


I want to determine which type of ROT encoding is used and based off that, do the correct decode.

Also, I have found the following code which will indeed decode rot13 "sbbone" to "foobart" correctly:

import codecs
codecs.decode('sbbone', 'rot_13')

The thing is I'd like to run this python file against an existing file which has rot13 encoding. (for example rot13.py encoded.txt).

Thank you!


Solution

  • To answer the second part of your first question, decode something in ROT-x, you can use the following code:

    def encode(s, ROT_number=13):
        """Encodes a string (s) using ROT (ROT_number) encoding."""
        ROT_number %= 26  # To avoid IndexErrors
        alpha = "abcdefghijklmnopqrstuvwxyz" * 2
        alpha += alpha.upper()
        def get_i():
            for i in range(26):
                yield i  # indexes of the lowercase letters
            for i in range(53, 78):
                yield i  # indexes of the uppercase letters
        ROT = {alpha[i]: alpha[i + ROT_number] for i in get_i()}
        return "".join(ROT.get(i, i) for i in s)
    
    
    def decode(s, ROT_number=13):
        """Decodes a string (s) using ROT (ROT_number) encoding."""
        return encrypt(s, abs(ROT_number % 26 - 26))
    

    To answer the first part of your first question, find the rot encoding of an arbitrarily encoded string, you probably want to brute-force. Uses all rot-encodings, and check which one makes the most sense. A quick(-ish) way to do this is to get a space-delimited (e.g. cat\ndog\nmouse\nsheep\nsay\nsaid\nquick\n... where \n is a newline) file containing most common words in the English language, and then check which encoding has the most words in it.

    with open("words.txt") as f:
        words = frozenset(f.read().lower().split("\n"))
        # frozenset for speed
    def get_most_likely_encoding(s, delimiter=" "):
        alpha = "abcdefghijklmnopqrstuvwxyz" + delimiter
        for punctuation in "\n\t,:; .()":
            s.replace(punctuation, delimiter)
        s = "".join(c for c in s if c.lower() in alpha)
        word_count = [sum(w.lower() in words for w in encode(
                s, enc).split(delimiter)) for enc in range(26)]
        return word_count.index(max(word_count))
    

    A file on Unix machines that you could use is /usr/dict/words, which can also be found here