I have a list of strings for which I need to find ALL common unique substrings (actually paths) with minimal length in them. Example:
/a/b/c
/a/b
/a
/d/e/f
/d/e
/g/h
For this input, I need the following result:
/a
/d/e
/g/h
As you see, I need the paths (or substrings) with the minimal length that have a unique prefix. /a is the minimal substring for all paths starting with /a. /d/e is the minimal substring for all paths starting with /d/e. The same goes for /g/h.
A practical application of this is to find all roots of the path trees that have a certain file in it to analyze them further. Consider this example:
/a/b/c/index.html
/a/b/index.html
/a/index.html
/d/e/f/index.html
/d/e/index.html
/g/h/index.html
Let's say I want to have the topmost (in terms of the root) paths that contain an index.html file. As a result, I want "/a/index.html", "/d/e/index.html" and "/g/h/index.html".
Any ideas? There is a lot of theory and examples for the "simple" longest common substring problem, but I have not yet found a solution for finding ALL the common longest substrings efficiently.
A solution with pseudo code would be highly appreciated.
Now with the improved description I think the following algorithm will do: