Search code examples
stringalgorithmdynamic-programmingdiscrete-mathematicslcs

Shortest common SuperSequence


The problem is that given 2 strings X and Y , we need to find length of the shortest sequence Z such that both the strings occur as sub sequence in Z. Now, I get the intuition that length = |X| + |Y| -|LCS(X,Y)|. But how do we prove it ?

Ex: X = AGGTAB , Y = GXTXAYB , then Z = AGXGTXAYB and |Z| = 9 . LCS(X,Y) = GTAB

Reference : Link 1 Link 2


Solution

  • Let be string 1 and be string 2.

    Let S be any shortest super sequence which consist of both X and Y as sequences. Let's try to create such a sequence.

    Clearly , X exists as a sequence in S. Hence, initially S can be shown as :


    Note: '...' means that place can be empty or can be used to place characters of Y into this to make it shortest super sequence consisting of both X and Y.

    Now, this |S| has to be minimum. We need to inject only minimum characters from Y into this S to make it shortest super sequence consisting of both X and Y. Also, note the condition is that Y needs to appear as a sub sequence and not substring. Hence, if I find any sequence of characters of Y which also occur in X, then those characters of Y need not be injected.

    Hence, for |S| to be minimum , we shown that minimum characters of Y need to be injected so that Y occurs as a sequence. For this, we find such longest sequence of X which also occur in Y which in other words = LCS(X,Y).

    |S| = |X| + ( |Y| - |LCS(X,Y)| ) = |X| +|Y| - |LCS(X,Y)|

    Now, multiple LCS(X,Y) exists. Take any and construct the sequence.