Search code examples
c#performancehashcode

How do I find three different strings with hash collisions, without using a brute force approach?


I've seen the following question for a hiring interview:

How do you find three different strings that have the same hash code in C#?

In other words, given the strings a, b, and c, the following four statements should be true:

a != b
a != c
a.GetHashCode() == b.GetHashCode()
a.GetHashCode() == c.GetHashCode()

Notes:

  1. You shouldn't override GetHashCode() and you shouldn't use your own String class. Use default .NET implementation.
  2. You don't need to know the exact implementation of string.GetHashCode().
  3. One should find the result relatively fast, without having to use multithreading.

I'm a bit puzzled by this one. Is there a way to do it, without actually enumerating strings one by one, which would definitively be very slow, and without checking the actual implementation of string.GetHashCode() to figure out how to make collisions?


Solution

  • You only have to generate about 16 million hashes before you find one that occurs 3 times. This only takes a few seconds and fits into a reasonable amount of memory:

    using System;
    using System.Collections.Generic;
    
    public class Program
    {
        public static void Main()
        {
            Random rnd = new Random();
            var dict = new Dictionary<int,String>();
            var pairDict = new Dictionary<int, Tuple<String, String>>();
            String[] result = null;
            while(result == null) {
                String s = rnd.Next(0x40000000).ToString("x8");
                int hash = s.GetHashCode();
                String match;
                if (dict.TryGetValue(hash, out match)) {
                    if (s == match) {
                        // already tried this string
                        continue;
                    }
                    Tuple<String, String> pair;
                    if (pairDict.TryGetValue(hash, out pair)) {
                        if (s == pair.Item2) {
                            // already tried this string
                            continue;
                        }
                        result = new String[] {s, pair.Item1, pair.Item2};
                    } else {
                        pairDict.Add(hash, new Tuple<String, String>(match, s));
                    }
                } else {
                    dict.Add(hash, s);
                }
            }
            foreach(String s in result) {
                Console.WriteLine( s + ".GetHashCode() == " + s.GetHashCode());
            }
        }
    }