Search code examples
pythonencryptionlinear-algebragalois-field

Decrypt binary sequence with random binary key. What's wrong with my script?


(Updated Below)

Attempting a problem from linear-algebra text Coding the Matrix by Philip Klein. Running into problems brute-forcing the ciphertext binary sequence against all possible keys. Problem 1.5.1, page 57:

An 11-symbol message has been encrypted as follows. Each symbol is represented by a number between 0 and 26 (A maps to 0, B to 25, space to 26.) Each number is represented by a five-bit binary sequence (0 maps to 00000, 26 to 11010). Finally, the resulting sequence of 55 bits is encrypted using a flawed version of the one-time pad: the key is not 55 bits but 11 copies of the same sequence of 5 random bits. The ciphertext is: '10101', '00100', '10101', '01011', '11001', '00011', '01011', '10101', '00100', '11001', '11010'

Goal is to figure out the plaintext. Issues I'm having are: One, my decoder function produces several 5-digit binary that are above int 26. This function is supposed to attempt the ciphertext binary sequence against every possible 5 digit binary sequence (the key) up to int 26, to produce every possible plaintext sequence. Two, am I supposed to use the Galois Field to ensure each binary sequence remains, well, binary? (1 + 1 = 0 and not 2) Any suggestions? I am attempting to learn linear-algebra (and improve upon my limited python abilities,) by using Klein's interesting text. It is rather difficult... Thank you!

import string

# The Cipher-text
Y = ['10101', '00100', '10101', '01011', '11001', '00011', '01011', '10101', '00100', '11001', '11010']


def trans(bit): # convert the bits into int
    x = ''.join(map(str, bit))
    return int(x, 2)


def decoder(cipher): # Try the ciphertext against each possible 5 bit sequence (up to 11010)
    pos_seq = ["".join("{0:05b}".format(x)) for x in range(27)]
    attempt = []
    for i in pos_seq:
        for j in cipher: # Add ciphertext binary to every possible binary 'key'.
            temp = [(int(i[0])+int(j[0])),(int(i[1])+int(j[1])),(int(i[2])+int(j[2])),
                    (int(i[3])+int(j[3])), (int(i[4])+int(j[4]))]
            attempt.append(temp)
    for i in range(len(attempt)): # Galois Field, 'two' becomes 'one'
        for k in attempt[i]:
            if k == 2:
                attempt[i][attempt[i].index(k)] = 0
    return attempt


new_list = decoder(Y)

decoded = [trans(i) for i in new_list] # translate each 5 digit sequence into int

alph = list(string.ascii_lowercase) # alphabet plus a space
alph.append(' ')

decoded2 = [alph[x] for x in decoded] # Translate int to letter or space

print(decoded2)

Update

Thanks to Dafang Cao and jason, I've edited the code as follows, and found the plaintext: eve is evil

  1. Increased range in decoder function to 32
  2. (Still have to wrap my head around GF(2) and XOR, including Dafang's use of x ^ y & mask)
  3. Sliced the list returned by decoder into 11-item lists using

chunks = [decoded[x:x+11] for x in range(0, len(decoded), 11)]

*as the cipher-text is made up of 11 'symbols'

  1. Mapped above list to the lambda function used by Dafang:
def decode(encoded):
    y = []
    for i in encoded:
        if i < 27:
            y.append(i)
    return ''.join(map(lambda x: alph[x], y))

for i in chunks:
    decoded2 = decode(i)
    print(decoded2)

Solution

  • There are 32 possible values using 5 bits. Knowing that valid values are 0-26, any decrypted value above 26 is a tell-tale sign that the key is not the key used in encryption. When brute forcing, you can skip the key as soon as you encounter such a value. In fact, I think that is how you are supposed to eliminate incorrect keys.

    Meanwhile, it's not stated that the key is no greater than 26. You should try all 32 combinations.

    Re. Galois Field, computer processors are naturally binary. You can take advantage of bit-wise operations like XOR to make the code faster. In fact, XOR is the add operation in GF(2). You can add 2 vectors in GF(2) using x ^ y & mask. When not dealing with GF(2), you can use modulo operator after an addition to "wrap" values into valid range x + y % n.

    import string
    
    # The Cipher-text
    Y = ['10101', '00100', '10101', '01011', '11001', '00011', '01011', '10101', '00100', '11001', '11010']
    # Parse to binary value
    y = list(map(lambda v: int(v, 2), Y))
    
    max_val = 32
    alphabet_size = 26
    
    # decrypt([0b11111, ...]) = [(key=0b1111, decrpyted=[0b11111, ...]), ...]
    def decrypt(encrypted):
        possible_keys = range(max_val)
        attempt = []
        for key in possible_keys:
            decrypted = []
            for symbol in encrypted:
                # XOR is equivalent to Add in GF(2)
                # If not GF(2), add then wrap around using modulo
                v = symbol ^ key & (max_val - 1)
                print('v = %d' % v)
                if v > alphabet_size:
                    break
                decrypted.append(v)
            if len(decrypted) < len(encrypted):
                print('bad key %d' % key)
                continue
            print('good key %d' % key)
            attempt.append((key, decrypted))
        return attempt
    
    # alphabet plus a space
    alph = list(string.ascii_lowercase)
    alph.append(' ')
    def decode(encoded):
        return ''.join(map(lambda x: alph[x], encoded))
    
    results = decrypt(y)
    
    for (key, encoded) in results:
        decoded = decode(encoded)
        print(decoded)