Search code examples
stringalgorithmsubsequence

Minimum number of subsequences required to interleave one string to another


I saw the original question Minimum of subsequences required to convert one string to another, and it's very similar to this quesion 1055. Shortest Way to Form String on LeetCode.

Description:

Given two strings source and target, return the minimum number of subsequences of source such that their concatenation equals target. If the task is impossible, return -1.

Example 1:

Input: source = "abc", target = "abcbc"

Output: 2

Explanation: The target "abcbc" can be formed by "abc" and "bc", which are subsequences of source "abc".

Supposed s' is a subsequence of source, so this problem is to find s'_1s'_2...s'_k to form target. My question is how to find the minimum number of subsequences required to interleave one string to another.

eg:

Input: source = "adbsc", target = "addsbc"

Output: 2

Explanation:

step1: s'1 = adbc, then target' = ds
step2: s'2 = ds, then target' = ""

I don't know whether it can slove this problem by removing the longest common subsequence of source and target' to form t', and repeat it untill t' = "".

Here is my code:

class Solution:
    def shortestWay(self, s: str, t: str) -> int:

        def lcs(s: str, t: str) -> str:
            m, n = len(s), len(t)
            dp = [[0] * (n + 1) for _ in range(m + 1)]
            for i in range(1, m + 1):
                for j in range(1, n + 1):
                    if s[i-1] == t[j-1]:
                        dp[i][j] = dp[i-1][j-1] + 1
                    else:
                        dp[i][j] = max(dp[i-1][j], dp[i][j-1])
            choose = [False] * n
            i, j = m, n
            while i + j != 0:
                if s[i-1] == t[j-1]:
                    choose[j-1] = True
                    i, j = i - 1, j - 1
                elif dp[i][j] == dp[i-1][j]:
                    i -= 1
                elif dp[i][j] == dp[i][j-1]:
                    j -= 1
            print(f'lcs({s}, {t}) is', ''.join(t[i] for i, v in enumerate(choose) if v))
            return ''.join(t[i] for i, v in enumerate(choose) if not v)
        ans = 0
        while t:
            ans += 1
            t1 = lcs(s, t)
            print(repr(t1))
            if t1 == t:
                return -1
            t = t1
        return ans
"""
lcs(adbsc, addsbc) is adbc
'ds'
lcs(adbsc, ds) is ds
''
"""

Is anybody can help me to proof/solve this problem, thanks!


Solution

  • Your working for your second example (source = "abc", target = "abcbc") doesn't make sense to me. Your algorithm idea (repeatedly removing the LCS from target) produces an optimal solution -- you just need to prove that no other solution can be better.

    Proof sketch

    Consider an optimal solution, containing OPT segments. If it equals your algorithm's solution then we're done; otherwise, it must consist of some number of common subsequence segments, the first k (possibly k=0) of which match the segments produced by your algorithm, followed by a (k+1)-th segment that is strictly shorter than your algorithm's (k+1)-th segment (since your algorithm always chooses the longest possible segment to add at each stage), followed by some number (possibly zero) of remaining segments.

    Notice that if some subsequence of a string S is equal to a string T, then for any given suffix of T, we can certainly find a subsequence of S equal to it -- all we need to do is drop some of the initial characters from the subsequence.

    So, getting back to our original problem: Some initial part of the remaining segments (possibly involving multiple segments) can be trimmed off to produce a list of segments that, when appended to the first k+1 segments produced by your algorithm, gives a solution that:

    • agrees with the first k+1 segments of your algorithm's solution, and
    • has segment count no worse than OPT.

    The optimal solution we started with agreed with your algorithm's solution on the first k segments; this new optimal solution agrees on at least one more segment. If the new solution is not yet completely equal to your algorithm's solution, then the analysis can be repeated, producing a new solution that agrees with it on at least the first k+2 segments. This can be repeated until we ultimately produce a solution that agrees completely with your algorithm's solution and has length OPT. Since we made no assumptions about the input instance, this proves your algorithm produces an optimal solution on every input instance.