Search code examples
pythonstringalgorithmdynamic-programmingsubsequence

Longest Repeating Subsequence: Edge Cases


Problem

While solving the Longest Repeating Subsequence problem using bottom-up dynamic programming, I started running into an edge case whenever a letter was repeated an odd number of times.

The goal is to find the longest subsequence that occurs twice in the string using elements at different indices. The ranges can overlap, but the indices should be disjoint (i.e., str[1], str[4] and str[2], str[6] can be a solution, but not str[1], str[2] and str[2], str[3]).

EDIT: Misunderstood example. (See the comment)

Minimum Reproducible Example

s = 'AXXXA'

n = len(s)

dp = [['' for i in range(n + 1)] for j in range(n + 1)]

for i in range(1, n + 1):
  for j in range(1, n + 1):
    if (i != j and s[i - 1] == s[j - 1]):
      dp[i][j] = dp[i - 1][j - 1] + s[i - 1]
    else:
      dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

print(dp[n][n])

Question

Any pointers on how to avoid this?

With input s = 'AXXXA', the answer should be either A or X, but the final result returns XX, apparently pairing up the third X with both the first X and the second X.

False Start

I don't want to add a check on a match (s[i - 1] == s[j - 1]) to see if s[i - 1] in dp[i - 1][j - 1] because another input might be something like AAJDDAJJTATA, which must add the A twice.


Solution

  • Actually, your initial algorithm and its answer are correct (... but this is a good question because others might confuse what an LRS means).

    Given your input (in), the subsequences (s1, s2) are:

    in: AXXXA
    s1:  XX
    s2:   XX
    

    So XX (length 2) is indeed the correct answer here.

    X would be the correct answer for the problem's non-overlapping version, where the ranges - not just individual indices - must be disjoint.