Search code examples
stringalgorithmdata-structuresprefix

Least number characters which can make a set of strings unique


What is the least number characters (starting from 1st character) which can make a set of strings unique.

For example, set of strings:

{
    'january',
    'february',
    'march',
    'april',
    'may',
    'june',
    'july'
}

Now, we cannot use just 1st character since 'j' is both in 'june' as well as 'july' (also, 'm' is in 'march' and 'may'). We also cannot use 1st 2 characters as 'ma' is both in 'march' and 'may'.

But, we can use 1st 3 characters!

What can be an optimal algorithm to return this number (besides an obvious brute force) ?


Solution

  • First thing you can optimize is to do a binary search over the size of prefix needed. The function is monotonous- if a given number of chars n is enough, then any value larger than n will also do.

    Second - you may compute rolling hashes for each string so that you can get the hash code for a given prefix of each string in constant time. Thus you will have to verify numbers(hash codes) for uniqueness instead of a string which of course is faster. I suggest a rolling hash like in rabin-karp.