Let's say we have sequence A
and subsequence of A
, called B
. I need to determine which elements of sequence A
could be potentially used to construct subsequence B
.
For example, assume A = [1, 3, 2, 1, 2, 3, 1, 3, 2]
and B = [1, 3, 1, 2]
To construct B
from A
, we could use elements at indexes:
So our answer should look like [true, true, false, true, true, true, true, false, true]
, where true
means that it's possible that element at i-th
index was used to construct subsequence B
.
It would be easy to solve with O(2^N * M)
time complexity (generate all subsequences of A
and compare them with B
). But I need solution that works in something like O(NlogN)
.
I figured out solution that seems working when all A
elements (so B
also) are unique.
I iterate over A
, and for every i-th
element I check whether:
A[i]
exists in B
, otherwise we can discard this i-th
elementj
is equal to index where A[i]
was found in B
. We check whether i >= j
, otherwise we can discard this i-th
element, because if j > i
, then our subsequence won't maintain orderB[j]
, so B[j - 1]
or B[j + 1]
is at the right side of A[i]
(all elements whose indexes are greater than i
). Now, because the elements are unique, we can say that this specific A[i]
element must have been used at some B[j]
position, because there will always be one A
subsequence that is equal to B
.That's how it could look in code:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
bool val_exists(const vector<pair<int, int>>& A_val_sorted, int val, int min_idx) {
if (val == -1) {
return true;
}
if (min_idx == A_val_sorted.size())
return false;
int l = 0;
int r = A_val_sorted.size() - 1;
while (l <= r) {
int mid = (l + r) / 2;
if (A_val_sorted[mid].first == val) {
if (A_val_sorted[mid].second >= min_idx)
return true;
l = mid + 1;
}
else if (A_val_sorted[mid].first < val) {
l = mid + 1;
}
else {
r = mid - 1;
}
}
return false;
}
int main()
{
int n, m;
cin >> n >> m;
// val, original index
vector<pair<int, int>> A_val_sorted(n);
vector<pair<int, int>> B_val_sorted(m);
vector<int> A(m);
for (int i = 0; i < n; ++i) {
cin >> A_val_sorted[i].first;
A_val_sorted[i].second = i;
}
for (int i = 0; i < m; ++i) {
cin >> B_val_sorted[i].first;
B_val_sorted[i].second = i;
A[i] = B_val_sorted[i].first;
}
sort(A_val_sorted.begin(), A_val_sorted.end());
sort(B_val_sorted.begin(), B_val_sorted.end());
vector<int> result(n);
for (const pair<int, int>& b : A_val_sorted) {
int val = b.first;
int idx = b.second;
int l = 0;
int r = m - 1;
bool valid = false;
while (l <= r) {
int mid = (l + r) / 2;
if (B_val_sorted[mid].first == val) {
int right_ngbr_idx = B_val_sorted[mid].second + 1;
if (B_val_sorted[mid].second <= idx && val_exists(A_val_sorted, (right_ngbr_idx < A.size() ? A[right_ngbr_idx] : -1), idx + 1)) {
valid = true;
break;
}
r = mid - 1;
}
else if (B_val_sorted[mid].first < val) {
l = mid + 1;
}
else {
r = mid - 1;
}
}
result[idx] = valid;
}
for (int i = 0; i < n; ++i)
std::cout << result[i] << ' ';
}
I know, for the case where elements are unique, it can be done much easier. But this approach provides idea on how to solve this problem when elements are non-unique.
Instead of checking one of the neigbhours, we can just check whole left and right part from B[j]
. Lets say N = A.size()
and M = B.size()
. This solution leads to O(NlogN + MlogM + N*M*logM)
, because we can sort the subsequences, store the original indexes etc. and use binary search to find A[i]
in B
.
It's too slow. How can I solve it for big sequences, like N = 10^5
?
There are a number of ways to do this.
The simplest is the following.
Use a greedy algorithm from the front to figure out which possible index is the earliest possible place that each position in B could go.
Use a greedy algorithm from the back to figure out which possible index is the latest possible place that each position in B could go.
Then in pseudocode:
create an array of False values
for each position in B
for each place in A with that value from earliest to latest this position in B can go
mark it True
Optimization. Create a dictionary mapping each value, to all of the locations in A where it is found. With the locations listed in order. This allows the search to be fast.
Second optimization. As a given position is marked True, remove it and all positions before it with the same value from the dictionary of locations. (Either they are already marked True, or no later position in B can result in them being marked True.)
With those optimizations, this becomes O(n)
. And therefore you'll have no problem with 10^5
elements.