Search code examples
phphashcryptographyctfripemd

CTF Type Juggling with ripemd160 hash


I am trying to solve a CTF in which the juggling type should be used. The code is:

if ($_GET["hash"] == hash("ripemd160", $_GET["hash"]))
{
    echo $flag;
}
else
{
    echo "<h1>Bad Hash</h1>";
}

I made a script in python which checks random hashes in ripemd160 that begins with "0e" and ends with only numbers. The code is:

def id_generator(size, chars=string.digits):
    return ''.join(random.choice(chars) for _ in range(size))
param = "0e"
results = []
while True:
    h = hashlib.new('ripemd160')
    h.update("{0}".format(str(param)).encode('utf-8'))
    hashed = h.hexdigest()
    if param not in results:
        print(param)
        if hashed.startswith("0e") and hashed[2:].isdigit():
            print(param)
            print(hashed)
            break
        results.append(param)
    else:
        print("CHECKED")
    param = "0e" + str(id_generator(size=10))

Any suggestions on how to solve it? Thank you!


Solution

  • There seems to be a bit of misunderstanding in the comments, so I'll start by explaining the problem a little more:

    Type juggling refers to the behaviour of PHP whereby variables are implicitly cast to different data types under certain conditions. For example, all the following logical expressions will evaluate to true in PHP:

    0 == 0                       // int vs. int
    "0" == 0                     // str -> int
    "abc" == 0                   // any non-numerical string -> 0
    "1.234E+03" == "0.1234E+04"  // string that looks like a float -> float
    "0e215962017" == 0           // another string that looks like a float
    

    The last of these examples is interesting because its MD5 hash value is another string consisting of 0e followed by a bunch of decimal digits (0e291242476940776845150308577824). So here's another logical expression in PHP that will evaluate to true:

    "0e215962017" == md5("0e215962017")
    

    To solve this CTF challenge, you have to find a string that is "equal" to its own hash value, but using the RIPEMD160 algorithm instead of MD5. When this is provided as a query string variable (e.g., ?hash=0e215962017), then the PHP script will disclose the value of a flag.

    Fake hash collisions like this aren't difficult to find. Roughly 1 in every 256 MD5 hashes will start with '0e', and the probability that the remaining 30 characters are all digits is (10/16)^30. If you do the maths, you'll find that the probability of an MD5 hash equating to zero in PHP is approximately one in 340 million. It took me about a minute (almost 216 million attempts) to find the above example.

    Exactly the same method can be used to find similar values that work with RIPEMD160. You just need to test more hashes, since the extra hash digits mean that the probability of a "collision" will be approximately one in 14.6 billion. Quite a lot, but still tractable (in fact, I found a solution to this challenge in about 15 minutes, but I'm not posting it here).

    Your code, on the other hand, will take much, much longer to find a solution. First of all, there is absolutely no point in generating random inputs. Sequential values will work just as well, and will be much faster to generate.

    If you use sequential input values, then you also won't need to worry about repeating the same hash calculations. Your code uses a list structure to store previously hashed values. This is a terrible idea. Searching for an item in a list is an O(n) operation, so once your code has (unsuccessfully) tested a billion inputs, it will have to compare every new input against each of these billion inputs at each iteration, causing your code to grind to a complete standstill. Your code would actually run a lot faster if you didn't bother checking for duplicates. When you have time, I suggest you learn when to use lists, dicts and sets in Python.

    Another problem is that your code only tests 10-digit numbers, which means it can only test a maximum of 10 billion possible inputs. Based on the numbers given above, are you sure this is a sensible limit?

    Finally, your code is printing every single input string before you calculate its hash. Before your program outputs a solution, you can expect it to print out somewhere in the order of a billion screenfuls of incorrect guesses. Is there any point in doing this? No.

    Here's the code I used to find the MD5 collision I mentioned earlier. You can easily adapt it to work with RIPEMD160, and you can convert it to Python if you like (although the PHP code is much simpler):

    $n = 0;
    while (1) {
        $s = "0e$n";
        $h = md5($s);
        if ($s == $h) break;
        $n++;
    }
    echo "$s : $h\n";
    

    Note: Use PHP's hash_equals() function and strict comparison operators to avoid this sort of vulnerability in your own code.