Search code examples
pythonhashcryptographyhash-collisionbirthday-paradox

How can I find a collision for a toy hash function?


I'd like to find a collision for a simple hash function below (python):

def hash_function(s=''):   # 'Hello World!' -> 7b2ea1ba
    a, b, c, d = 0xa0, 0xb1, 0x11, 0x4d
    result_hash = ''

    for byte in bytes(s, 'ascii'):
        a ^= byte
        b = b ^ a ^ 0x55
        c = b ^ 0x94
        d = c ^ byte ^ 0x74

    for i in [d, c, a, b]:
        tmp = str(hex(i))[2:]
        result_hash += tmp if len(tmp) is 2 else '0' + tmp

    return result_hash

here's also a js implementation in jsbin

I've found this question on SO, but the answer there wasn't quite comprehensible to me.

The length of the output of the function is always equal to 8. a, b, c and d variables are integers that are converted into hex values in the end to form the resulting hash, i.e. 123 -> 7b, 46 -> 2e, 13 -> 0d and so on.


So, could you help me with finding a collision for that function?


Solution

  • A simple way to find pairs of strings that have the same hash is to generate random strings, hash them and store the results in a dict, using the hash as the key and the string as the value. If you get a hash that's already in the dict, print it.

    I've optimized your hash_function slightly, and made the code Python 2 / 3 compatible.

    from __future__ import print_function
    from random import choice, randrange, seed 
    
    def hash_function(s=''):   # 'Hello World!' -> 7b2ea1ba
        a, b, c, d = 0xa0, 0xb1, 0x11, 0x4d
    
        for byte in bytearray(s):
            a ^= byte
            b = b ^ a ^ 0x55
            c = b ^ 0x94
            d = c ^ byte ^ 0x74
    
        return format(d<<24 | c<<16 | a<<8 | b, '08x') 
    
    s = b'Hello World!'
    print(s, hash_function(s))
    
    #ASCII chars that print nicely
    ascii = b''.join([chr(i) for i in range(33, 127)])
    
    seed(37)
    
    found = {}
    for j in range(5000):
        #Build a random 4 byte random string
        s = b''.join([choice(ascii) for _ in range(4)])
        h = hash_function(s)
        if h in found:
            v = found[h]
            if v == s:
                #Same hash, but from the same source string
                continue
            print(h, found[h], s)
        found[h] = s
    

    output

    Hello World! 7b2ea1ba
    0944dbd0 TXN9 YXC9
    105a9dce wA5> rA0>
    7a29e4bd %+m' -+e'
    7776b2e2 u&4u n&/u
    7c3ea3aa D-\6 z-b6
    6d46d1d2 `<r_ "<0_
    6a26e0b2 ;;x8 ";a8
    1033eda7 ,AwW =AfW
    627395e7 #3@e ;3Xe
    7d6ee7fa D,Hg `,lg
    3c2bb2bf NmRc Cm_c
    1e31b9a5 nOc[ oOb[
    1a49f7dd MKv' ]Kf'
    161beb8f )G\y IG<y
    0247bbd3 !SX1 VS/1