Search code examples
pythonpython-2.7cryptographypermutation

Influence of choosing string as seed of random on the output


In python, in order to permutate the lettters of a string one can write

import random
random.seed(str_key)
length = range(len(original_str))
random.shuffle(length)
join "".join([original_key[idx] for idx in length])

I wonder what does seed do with the key string and how it produces from it the permutation (or says shuffle how to do that). For example if I take the key to be 'hGd' how I get that specific output while if I write another key like 'AGd' I get another output?

EDIT: The decryption algorithm I tried to use on that code is:

for key in itertools.product(*3*[string.ascii_letters]):
    indices = range(len(enc_msg))
    list_encrypted_msg = list(enc_msg)
    random.seed(key)
    random.shuffle(indices)
    decrypted = ""

    for idx in indices[::-1]:
        decrypted += list_encrypted_msg[idx]

    try:
        if not decrypted.index("The"):
            print decrypted
    except ValueError:
        continue
return "not found!"

Solution

  • What seed() does with its argument is to pass it to the built-in hash() function, which converts it to a 32 bit signed integer, in other words a number in the range -2,147,483,648 to 2,147,483,647. That number is then used as the starting number by the pseudo-random integer generator (by default, the Mersenne Twister algorithm) that is the heart of the standard random functions.

    Each time a pseudo-random number generator (PRNG) is called it does a particular arithmetic operation on its current number to produce a new number. It may return that number as is, or it may return a modified version of that number. See Wikipedia for a simple type of PRNG.

    With a good PRNG it is very hard to predict what the next number in the sequence is going to be, and Mersenne Twister is quite good. So it's not easy to predict the effect that different seeds will have on the output.

    BTW, you can pass seed() any kind of hashable object. So it can be passed an int, string, tuple, etc, but not a list. But as I said above, whatever you pass it, it gets converted to a number.

    Update: In recent versions of Python, random.seed takes an optional version arg: version 1 works as I described above, but version 2 (the default in Python 3.2+) a strbytes, or bytearray object gets converted to an int and all of its bits are used.

    And I guess I should mention that if you call seed() without a seed value it uses the system entropy pool to generate a seed value, and if the system doesn't provide an entropy pool (which is unlikely, except for extremely tiny or old embedded systems), it uses the current time as its seed value.


    The Mersenne Twister algorithm has a period of 2**19937 - 1, which is a number of about 6000 decimal digits. So it takes a very long time before the cycle of integers it produces repeats exactly. Of course, individual integers and sub-sequences of integers will repeat much sooner. And a cryptographic attack on it only needs 624 (full) outputs to determine the position in the cycle. Python's version of Mersenne Twister doesn't actually return the integers it calculates, it converts them to 53 bit floating-point numbers.

    Please see the Wikipedia article on the Mersenne Twister if you're curious to know how it works. Mersenne Twister was very impressive when it was first published, but there are now superior RNGs that are faster, more efficient, and have better statistical properties, eg the PCG family. We don't have PCG in the Python standard library yet, but PCG is now the default PRNG in Numpy.

    FWIW, here's a slightly improved version of your program.

    import random
    
    #Convert string to list
    msg = list(original_str)
    
    random.seed(str_key)
    random.shuffle(msg)
    print "".join(msg)
    

    Now, onto your decryption problem. :) Is the message you have to decrypt merely scrambled, as by the program above, or does it use some other form of encryption? If it's merely scrambled, it will be relatively easy to unscramble. So unless you tell me otherwise, I shall assume that to be the case.

    You said that the key length is 3. Is the key purely alphabetic, or can the 3 characters in the key be anything in the range chr(0) to chr(255)? Either way, that's not really a lot of keys to check, and a Python program will be able to unscramble the message using a brute-force search of all keys in less than one second minute.


    To iterate over all possible keys, you can do this:

    from itertools import product
    from string import ascii_letters
    
    for k in product(*3*[ascii_letters]):
        str_key = ''.join(k)
    

    I used product() in that code because we want to generate all possible strings of 3 ascii letters, so we want the Cartesian product of 3 copies of ascii_letters. 3*[ascii_letters] is equivalent to [ascii_letters, ascii_letters, ascii_letters] and putting * in front unpacks that list so that product() gets 3 separate args. If we use permutations() then we don't get any strings with repeated characters. To illustrate:

    >>> import itertools
    >>> s='abc'
    >>> [''.join(u) for u in itertools.permutations(s, 3)]
    ['abc', 'acb', 'bac', 'bca', 'cab', 'cba']
    >>> [''.join(u) for u in itertools.product(*3*[s])]
    ['aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 
     'baa', 'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 
     'caa', 'cab', 'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']
    

    Update: product takes a repeat keyword arg, so we can simplify that to itertools.product(s, repeat=3). ......

    I thought you said that the string to be decoded has 42 chars, but there are only 40 chars in euTtSa:0 kty1h a0 nlradstara atlot 5wtic. Also, the appearance of the digits 0 & 5 in that string is a bit of a worry, although I guess the original unscrambled version could have digits in it...

    Anyway, I just tried unscrambling that string using the shuffle algorithm with all possible 140608 3 letter keys and printing the permutations produced that begin with The. There are only 5 of them, and only one of those had a space after The. But in every case the rest of the unscrambled string is garbage. My guess is that you've misunderstood the encryption algorithm that your lecturer used.


    Just in case you were wondering how random.shuffle() works, you can see Python source code here; the C code for the random module is here.

    It's the Yates-Fisher shuffle, which is like a randomized version of one pass through a selection sort.

    Another cute method that's sometimes seen is to sort the list using a random comparison function, or a random key function. Eg

    >>> import random
    >>> random.seed(42)
    >>> for i in range(10):
    ...   s.sort(key=lambda i:random.random())
    ...   print ''.join(s)
    ...
    gabecdf
    dbgfeac
    agbfdce
    cebdgaf
    fgedbca
    afbecgd
    bcaegfd
    aebcdfg
    bacgfed
    fgdebca
    

    However, this shuffling technique is relatively slow, and it has bad statistical properties. So please do not use it! The Fisher-Yates technique used by random.shuffle() is (pretty much) the optimal shuffling algorithm.


    Reversing the shuffling procedure for a given key

    Let's look at what happens when we shuffle a simple range.

    from random import seed, shuffle
    
    r = range(5)
    key = 'zap'
    seed(key)
    shuffle(r) 
    

    After the shuffle, r will be

    [2, 4, 1, 3, 0]

    So to unshuffle r we need to build this list:

    [r[4], r[2], r[0], r[3], r[1]]

    Can you see how to do that? If you can't figure it out, I'm happy to post my code, but I think you should spend a little bit of time trying to figure it out first. Hint: Don't try to do it in a list comprehension, just use a for loop.


    Ok. You've struggled with this long enough. Here's my decoder.

    #! /usr/bin/env python
    
    ''' Unscramble a string of text by brute force
    
    From http://stackoverflow.com/questions/26248379/influence-of-choosing-string-as-seed-of-random-on-the-output
    
    '''
    
    import sys
    from random import seed, shuffle
    from itertools import product
    from string import ascii_letters
    
    
    def scramble(seq, key):
        seed(key)
        msg = seq[:]
        shuffle(msg)
        return msg
    
    
    def scramble_old(seq, key):
        seed(key)
        r = range(len(seq))
        shuffle(r)
        return [seq[i] for i in r]
    
    
    def unscramble(seq, key):
        seed(key)
        r = range(len(seq))
        shuffle(r)
        newseq = len(seq) * [None]
        for i, j in enumerate(r):
            newseq[j] = seq[i]
        return newseq
    
    
    def test():
        key = 'zap'
        #orig = 'quickbrownfox'
        orig = '01234'
    
        print 'orig:       ', orig
        shuf = scramble(list(orig), key)
        print 'shuffled:   ', ''.join(shuf)
        unshuf = unscramble(shuf, key)
        print 'unshuffled: ', ''.join(unshuf)
    
    
    def decode(seq, begin):
        count = 0
        begin_len = len(begin)
        for k in product(*3*[ascii_letters]):
            key = ''.join(k)
            dup = seq[:]
            newseq = unscramble(dup, key)
            if newseq[:begin_len] == begin:
                count += 1
                print '%s: [%s] %s' % (key, ''.join(newseq), count)
                #print '     [%s]\n' % ''.join(scramble(newseq, key))
    
    
    def main():
        original_str = 'euTtSa:0 kty1h a0  nlradstara  atlot 5wtic'.lower()    
        original_list = list(original_str)
        print '     [%s], %d\n' % (original_str, len(original_str))
    
        decode(original_list, begin=list('the'))
    
    
    if __name__ == '__main__':
        #test()
        main()