Search code examples
phpemailhashavatar

Is it possible the MD5 of two different strings be identical?


I'm trying to create a dynamic avatar for my website's users. Something like stackoverflow. I have a PHP script which generates an image based on a string:

path/to/avatar.php?hash=string

I want to use the MD5 of users' emails as the name of their avatars: (and as that string PHP script generates an image based on)

$email = $_GET['email'];
$hash  = md5($email);
copy("path/to/avatar.php?hash=$hash","path/img/$hash.jpg");

Now I want to be sure, can I use the MD5 of their emails as their avatar's name? I mean isn't there two different strings which have identical MD5's output? In other word I want to know whether will be the output of two different strings unique?

I don't know my question is clear or not .. All I want to know, is there any possibility of being duplicate the MD5 of two different emails?


Solution

  • As the goal here is to use a hash for it's uniqueness rather than it's cryptographic strength MD5 is acceptable. Although I still wouldn't recommend it.

    If you do settle on using MD5, use a globally unique id that you control rather than an user-supplied email address, along with a salt.

    i.e.

    $salt = 'random string';
    $hash = md5($salt . $userId);
    

    However:

    • There is still a small chance of a collision (starting at 2128 and approaching 264 relatively quickly due to the Birthday Paradox). Remember this is a chance, hashn and hashn+1 could collide.
    • There is not a reasonable way to determine the userId from the hash (I don't consider indexing 128-bit hashes so you can query them to be reasonable).

    You use StackOverflow as an example.

    User profiles on this site look like: http://stackoverflow.com/users/2805376/shafizadeh

    So what is wrong with having avatar urls like http://your_site/users/2805376/avatar.png ? The back end storage could simply be /path/to/images/002/805/376.png

    This guarantees a unique name, and provides you with a very simple and easy to work with way of storing, locating, and reversing the id assigned to images back to the user.