Search code examples
c++algorithmdynamic-programmingsuffix-treelcs

How to speed up calculation of length of longest common substring?


I have two very large strings and I am trying to find out their Longest Common Substring.

One way is using suffix trees (supposed to have a very good complexity, though a complex implementation), and the another is the dynamic programming method (both are mentioned on the Wikipedia page linked above).

Using dynamic programming alt text

The problem is that the dynamic programming method has a huge running time (complexity is O(n*m), where n and m are lengths of the two strings).

What I want to know (before jumping to implement suffix trees): Is it possible to speed up the algorithm if I only want to know the length of the common substring (and not the common substring itself)?


Solution

  • Will it be faster in practice? Yes. Will it be faster regarding Big-Oh? No. The dynamic programming solution is always O(n*m).

    The problem that you might run into with suffix trees is that you trade the suffix tree's linear time scan for a huge penalty in space. Suffix trees are generally much larger than the table you'd need to implement for a dynamic programming version of the algorithm. Depending on the length of your strings, it's entirely possible that dynamic programming will be faster.

    Good luck :)