Search code examples
javajava-8apache-commons-lang3

Find relevant parts in a collection of strings


I've got a set of path strings:

/content/example-site/global/library/about/contact/thank-you.html
/content/example-site/global/corporate/about/contact/thank-you.html
/content/example-site/countries/uk/about/contact/thank-you.html
/content/example-site/countries/de/about/contact/thank-you.html
/content/example-site/others/about/contact/thank-you.html
...

(Often the paths are much longer than this)

As you can see it is difficult to notice the differences immediately. That's why I would like to highlight the relevant parts in the strings.

To find the differences I currently calculate the common prefix and suffix of all strings:

String prefix = getCommonPrefix(paths);
String suffix = getCommonSuffix(paths);
for (String path : paths) {
    String relevantPath = path.substring(prefix.length(), path.length() - suffix.length());
    // OUTPUT: prefix + "<b>" + relevantPath + "</b>" + suffix
}

For the prefix I'm using StringUtils.getCommonPrefix from Commons Lang.

For the suffix I couldn't find a utility (neither in Commons nor in Guava, the later has only one for exactly two strings). So I had to write my own - similar to the one from Commons Lang.

I'm now wondering, if I missed some function in one of the libraries - or if there is an easy way with Java 8 streaming functions?


Solution

  • Here is a little hack, I do not say it is optimal nor nothing but it could be interesting to follow this path if no other option is available:

    String[] reversedPaths = new String[paths.length];
    for (int i = 0; i < paths.length; i++) {
        reversedPaths[i] = StringUtils.reverse(paths[i]);
    }
    String suffix = StringUtils.reverse(StringUtils.getCommonPrefix(reversedPaths));