Determine which elements of sequence A can be used to create given sequence B in linear time

Let's say we have sequence A and subsequence of A, called B. I need to determine which elements of sequence A could be potentially used to construct subsequence B.

For example, assume A = [1, 3, 2, 1, 2, 3, 1, 3, 2] and B = [1, 3, 1, 2]

To construct B from A, we could use elements at indexes:

1, 2, 4, 5
1, 2, 4, 9
1, 2, 7, 9
1, 6, 7, 9

So our answer should look like [true, true, false, true, true, true, true, false, true], where true means that it's possible that element at i-th index was used to construct subsequence B.

It would be easy to solve with O(2^N * M) time complexity (generate all subsequences of A and compare them with B). But I need solution that works in something like O(NlogN).

I figured out solution that seems working when all A elements (so B also) are unique.

I iterate over A, and for every i-th element I check whether:

A[i] exists in B, otherwise we can discard this i-th element
Let's say j is equal to index where A[i] was found in B. We check whether i >= j, otherwise we can discard this i-th element, because if j > i, then our subsequence won't maintain order
We check if any neigbhour of B[j], so B[j - 1] or B[j + 1] is at the right side of A[i] (all elements whose indexes are greater than i). Now, because the elements are unique, we can say that this specific A[i] element must have been used at some B[j] position, because there will always be one A subsequence that is equal to B.

That's how it could look in code:

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

bool val_exists(const vector<pair<int, int>>& A_val_sorted, int val, int min_idx) {
    if (val == -1) {
        return true;
    }

    if (min_idx == A_val_sorted.size())
        return false;

    int l = 0;
    int r = A_val_sorted.size() - 1;

    while (l <= r) {
        int mid = (l + r) / 2;

        if (A_val_sorted[mid].first == val) {
            if (A_val_sorted[mid].second >= min_idx)
                return true;

            l = mid + 1;
        }
        else if (A_val_sorted[mid].first < val) {
            l = mid + 1;
        }
        else {
            r = mid - 1;
        }
    }

    return false;
}

int main()
{
    int n, m;
    cin >> n >> m;

    // val, original index
    vector<pair<int, int>> A_val_sorted(n);
    vector<pair<int, int>> B_val_sorted(m);

    vector<int> A(m);

    for (int i = 0; i < n; ++i) {
        cin >> A_val_sorted[i].first;
        A_val_sorted[i].second = i;
    }

    for (int i = 0; i < m; ++i) {
        cin >> B_val_sorted[i].first;
        B_val_sorted[i].second = i;

        A[i] = B_val_sorted[i].first;
    }

    sort(A_val_sorted.begin(), A_val_sorted.end());
    sort(B_val_sorted.begin(), B_val_sorted.end());

    vector<int> result(n);
    for (const pair<int, int>& b : A_val_sorted) {
        int val = b.first;
        int idx = b.second;

        int l = 0;
        int r = m - 1;
        bool valid = false;
        while (l <= r) {
            int mid = (l + r) / 2;

            if (B_val_sorted[mid].first == val) {
                int right_ngbr_idx = B_val_sorted[mid].second + 1;
                if (B_val_sorted[mid].second <= idx && val_exists(A_val_sorted, (right_ngbr_idx < A.size() ? A[right_ngbr_idx] : -1), idx + 1)) {
                    valid = true;
                    break;
                }

                r = mid - 1;
            }
            else if (B_val_sorted[mid].first < val) {
                l = mid + 1;
            }
            else {
                r = mid - 1;
            }
        }

        result[idx] = valid;
    }

    for (int i = 0; i < n; ++i)
        std::cout << result[i] << ' ';
}

I know, for the case where elements are unique, it can be done much easier. But this approach provides idea on how to solve this problem when elements are non-unique.

Instead of checking one of the neigbhours, we can just check whole left and right part from B[j]. Lets say N = A.size() and M = B.size(). This solution leads to O(NlogN + MlogM + N*M*logM), because we can sort the subsequences, store the original indexes etc. and use binary search to find A[i] in B.

It's too slow. How can I solve it for big sequences, like N = 10^5?

Solution

There are a number of ways to do this.

The simplest is the following.

Use a greedy algorithm from the front to figure out which possible index is the earliest possible place that each position in B could go.

Use a greedy algorithm from the back to figure out which possible index is the latest possible place that each position in B could go.

Then in pseudocode:

create an array of False values
for each position in B
    for each place in A with that value from earliest to latest this position in B can go
        mark it True

Optimization. Create a dictionary mapping each value, to all of the locations in A where it is found. With the locations listed in order. This allows the search to be fast.

Second optimization. As a given position is marked True, remove it and all positions before it with the same value from the dictionary of locations. (Either they are already marked True, or no later position in B can result in them being marked True.)

With those optimizations, this becomes O(n). And therefore you'll have no problem with 10^5 elements.