Suppose we have 2 sequences(Arrays) A[1..m]
and B[1..n]
of integers.
How can we find the length of the biggest common contiguous subsequence of A
and B
in O(nlogn + mlogm + k^2) where k is the number of (x, y) tuples where A[x] = B[y]
?
This is an example input:
A={1,2,3,4,5}
B={2,4,5,9,11,20}
output: 2
Note that it is not guaranteed that arrays are sorted:
A={2,7,1,8,11}
B={1,3,7,1,11,2}
output: 2
Let's refer to those "tuples (x,y) where A[x] == B[y]
as "correspondences".
A common subsequence, then, corresponds to a list of correspondences C
where there is are no two correspondences C[i]
and C[j]
such that C[i].x < C[j].x AND C[i].y > C[j].y
or vice-versa.
First, make a list of all correspondences, sorted by x then y. We can easily do this within the required complexity.
Then, we will iterate through this list and determine, for each correspondence, the length of longest subsequence that ends with that correspondence. The largest result is the problem solution.
We will maintain an array BEST[y]
which stores, for each y
, the best result so far where the final correspondence c
has c.y == y
. Initially this is all zeros.
For each correspondence c
:
BEST[y]
such that y <= c.y
, and add 1 to get the result
for c
. Remember the largest of these results, as it will be the solution to the problem.result > BEST[c.y]
, set BEST[c.y] = result
That's it... but, in order to meet your complexity requirement, that search in step (1) needs to be done in O(k) time, even if k is much smaller than max(y). You can do that by keeping a list of the non-zero values instead of an array. There can be at most k of those, so you can search and update the list in O(k) time easily.
If you keep those entries in a binary search tree, and additionally remember the maximum value in each subtree, then you can reduce the complexity to O( nlogn + mlogm + klogk ).