I've seen the following question for a hiring interview:
How do you find three different strings that have the same hash code in C#?
In other words, given the strings
a
,b
, andc
, the following four statements should be true:a != b a != c a.GetHashCode() == b.GetHashCode() a.GetHashCode() == c.GetHashCode()
Notes:
- You shouldn't override
GetHashCode()
and you shouldn't use your ownString
class. Use default .NET implementation.- You don't need to know the exact implementation of
string.GetHashCode()
.- One should find the result relatively fast, without having to use multithreading.
I'm a bit puzzled by this one. Is there a way to do it, without actually enumerating strings one by one, which would definitively be very slow, and without checking the actual implementation of string.GetHashCode()
to figure out how to make collisions?
You only have to generate about 16 million hashes before you find one that occurs 3 times. This only takes a few seconds and fits into a reasonable amount of memory:
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
Random rnd = new Random();
var dict = new Dictionary<int,String>();
var pairDict = new Dictionary<int, Tuple<String, String>>();
String[] result = null;
while(result == null) {
String s = rnd.Next(0x40000000).ToString("x8");
int hash = s.GetHashCode();
String match;
if (dict.TryGetValue(hash, out match)) {
if (s == match) {
// already tried this string
continue;
}
Tuple<String, String> pair;
if (pairDict.TryGetValue(hash, out pair)) {
if (s == pair.Item2) {
// already tried this string
continue;
}
result = new String[] {s, pair.Item1, pair.Item2};
} else {
pairDict.Add(hash, new Tuple<String, String>(match, s));
}
} else {
dict.Add(hash, s);
}
}
foreach(String s in result) {
Console.WriteLine( s + ".GetHashCode() == " + s.GetHashCode());
}
}
}