Search code examples
arraysalgorithmduplicatesfuzzy

Efficiently find the shortest unique string ending in array of strings


I have an array of pseudo-random strings like this:

let values = [
    '4730788382dd8d15bD1Dfb',
    '078846cf883d8d15bD1DZb',
    '4730g21e260857Fb5771d3',
    'fecc51b693F9A0Cec49fd2',
    'c14a621e263857Fb577fd3',
    '7936CcfF6cD3bd71DBF121',
    '4730g21e260857Fb6771d3',
    'A915CcfF6cD3bd71DBC121',
    'd7B1F43E05985E88b1EF10',
    '4730g21e263857Fb5771d3',
];

An external API, given a string, will do fuzzy matching starting from the end.

'fb' will return '4730788382dd8d15bD1Dfb'

because it is the only string in the array ending in fb. If the input string doesn't match exactly 1 element, the request is invalid. The list gets updated from time to time, so I'd like a function that can create a map of the minimal length required to guarantee exactly one match.

The resulting map could look like this:

let result = new Map([
    ['4730788382dd8d15bD1Dfb', 2],
    ['078846cf883d8d15bD1DZb', 2],
    ['4730g21e260857Fb5771d3', 12],
    ['fecc51b693F9A0Cec49fd2', 1],
    ['c14a621e263857Fb577fd3', 3],
    ['7936CcfF6cD3bd71DBF121', 4],
    ['4730g21e260857Fb5771d3', 6],
    ['A915CcfF6cD3bd71DBC121', 4],
    ['d7B1F43E05985E88b1EF10', 1],
    ['4730g21e263857Fb5771d3', 12],
]);

For each string element, the value is the number of characters, starting from the end that needs to be sent to get exactly one match.

I have a working solution, but it uses nested loops and isn't elegant or efficient. Does this problem have a name? I'm pretty sure it can the done in O(n).

Some properties: All strings are unique The array has 1000 - 5000 elements and is unlikely to go past 5000.

Thanks for any ideas / links!


Solution

  • An efficient approach would involve using tries for this task.

    1. Insert all strings (in a reversed order) in the trie. For n strings of length m, this would have a time complexity of O(n*m).
    2. For each string, start traversing the trie (the string would be reversed here as well). Traverse till you find a node that has only one child/path (which means no other string has an intersection with the current string from this point on). Count the number of nodes that you had just traversed, since that would be the minimum length value that you need. For n strings of length m, this would have a time complexity of O(n*m)

    Overall time complexity = O(n*m) for n strings of length m.