Search code examples
algorithmstringcomplexity-theorygraph-algorithmsuffix-tree

String analysis


Given a sequence of operations:

a*b*a*b*a*a*b*a*b

is there a way to get the optimal subdivision to enable reusage of substring.

making

a*b*a*b*a*a*b*a*b => c*a*c, where c = a*b*a*b

and then seeing that

a*b*a*b => d*d, where d = a*b

all in all reducing the 8 initial operations into the 4 described here?

(c = (d = a*b)*d)*a*c

The goal of course is to minimize the number of operations

I'm considering a suffixtree of sorts.

I'm especially interested in linear time heuristics or solutions. The '*' operations are actually matrix multiplications.


Solution

  • This whole problem is known as "Common Subexpression Elimination" or CSE. It is a slightly smaller version of the problem called "Graph Reduction" faced by the implementer of compilers for functional programming languages. Googling "Common Subexpression elimination algorithm" gives lots of solutions, though none that I can see especially for the constraints given by matrix multiplication.

    The pages linked to give a lot of references.

    My old answer is below. However, having researched a bit more, the solution is simply building a suffix tree. This can be done in O(N) time (lots of references on the wikipedia page). Having done this, the sub-expressions (c, d etc. in your question) are just nodes in the suffix tree - just pull them out.


    However, I think MarcoS is on to something with the suggestion of Longest repeating Substring, as graph reduction precedence might not allow optimisations that can be allowed here.

    sketch of algorithm:

    optimise(s):
        let sub = longestRepeatingSubstring(s).
        optimisedSub = optimise(sub)
        return s with sub replaced by optimisedSub
    

    Each run of longest repeating substring takes time N. You can probably re-use the suffix tree you build to solve the whole thing in time N.