Search code examples
arraysalgorithmdynamic-programminggreedylis

Longest K Sequential Increasing Subsequences


Why I created a duplicate thread

I created this thread after reading Longest increasing subsequence with K exceptions allowed. I realised that the person who was asking the question hadn't really understood the problem, because he was referring to a link which solves the "Longest Increasing sub-array with one change allowed" problem. So the answers he got were actually irrelevant to LIS problem.

Description of the problem

Suppose that an array A is given with length N. Find the longest increasing sub-sequence with K exceptions allowed.

Example
1) N=9 , K=1

A=[3,9,4,5,8,6,1,3,7]

Answer: 7

Explanation:

Longest increasing subsequence is : 3,4,5,8(or 6),1(exception),3,7 -> total=7

  1. N=11 , K=2

A=[5,6,4,7,3,9,2,5,1,8,7]

answer: 8

What I have done so far...

If K=1 then only one exception is allowed. If the known algorithm for computing the Longest Increasing Subsequence in O(NlogN) is used (click here to see this algorithm), then we can compute the LIS starting from A[0] to A[N-1] for each element of array A. We save the results in a new array L with size N. Looking into example n.1 the L array would be: L=[1,2,2,3,4,4,4,4,5].

Using the reverse logic, we compute array R, each element of which contains the current Longest Decreasing Sequence from N-1 to 0.

The LIS with one exception is just sol=max(sol,L[i]+R[i+1]), where sol is initialized as sol=L[N-1]. So we compute LIS from 0 until an index i (exception), then stop and start a new LIS until N-1.

A=[3,9,4,5,8,6,1,3,7]

L=[1,2,2,3,4,4,4,4,5]

R=[5,4,4,3,3,3,3,2,1]

Sol = 7

-> step by step explanation:

init: sol = L[N]= 5

i=0 : sol = max(sol,1+4) = 5 
i=1 : sol = max(sol,2+4) = 6
i=2 : sol = max(sol,2+3) = 6
i=3 : sol = max(sol,3+3) = 6
i=4 : sol = max(sol,4+3) = 7
i=4 : sol = max(sol,4+3) = 7
i=4 : sol = max(sol,4+2) = 7
i=5 : sol = max(sol,4+1) = 7

Complexity : O( NlogN + NlogN + N ) = O(NlogN)

because arrays R, L need NlogN time to compute and we also need Θ(N) in order to find sol.

Code for k=1 problem

#include <stdio.h>
#include <vector>

std::vector<int> ends;

int index_search(int value, int asc) {
    int l = -1;
    int r = ends.size() - 1;
    while (r - l > 1) { 
        int m = (r + l) / 2; 
        if (asc && ends[m] >= value) 
            r = m; 
        else if (asc && ends[m] < value)
            l = m;
        else if (!asc && ends[m] <= value)
            r = m;
        else
            l = m;
    } 
    return r;
}

int main(void) {
    int n, *S, *A, *B, i, length, idx, max;

    scanf("%d",&n);
    S = new int[n];
    L = new int[n];
    R = new int[n];
    for (i=0; i<n; i++) {
        scanf("%d",&S[i]);
    }

    ends.push_back(S[0]);
    length = 1;
    L[0] = length;
    for (i=1; i<n; i++) {
        if (S[i] < ends[0]) {
            ends[0] = S[i];
        }
        else if (S[i] > ends[length-1]) {
            length++;
            ends.push_back(S[i]);
        }
        else {
            idx = index_search(S[i],1);
            ends[idx] = S[i];
        }
        L[i] = length;
    }

    ends.clear();
    ends.push_back(S[n-1]);
    length = 1;
    R[n-1] = length;
    for (i=n-2; i>=0; i--) {
        if (S[i] > ends[0]) {
            ends[0] = S[i];
        }
        else if (S[i] < ends[length-1]) {
            length++;
            ends.push_back(S[i]);
        }
        else {
            idx = index_search(S[i],0);
            ends[idx] = S[i];
        }
        R[i] = length;
    }

    max = A[n-1];
    for (i=0; i<n-1; i++) {
        max = std::max(max,(L[i]+R[i+1]));
    }

    printf("%d\n",max);
    return 0;
}

Generalization to K exceptions

I have provided an algorithm for K=1. I have no clue how to change the above algorithm to work for K exceptions. I would be glad if someone could help me.


Solution

  • This answer is modified from my answer to a similar question at Computer Science Stackexchange.

    The LIS problem with at most k exceptions admits a O(n log² n) algorithm using Lagrangian relaxation. When k is larger than log n this improves asymptotically on the O(nk log n) DP, which we will also briefly explain.

    Let DP[a][b] denote the length of the longest increasing subsequence with at most b exceptions (positions where the previous integer is larger than the next one) ending at element b a. This DP is not involved in the algorithm, but defining it makes proving the algorithm easier.

    For convenience we will assume that all elements are distinct and that the last element in the array is its maximum. Note that this does not limit us, as we can just add m / 2n to the mth appearance of every number, and append infinity to the array and subtract one from the answer. Let V be the permutation for which 1 <= V[i] <= n is the value of the ith element.

    To solve the problem in O(nk log n), we maintain the invariant that DP[a][b] has been calculated for b < j. Loop j from 0 to k, at the jth iteration calculating DP[a][j] for all a. To do this, loop i from 1 to n. We maintain the maximum of DP[x][j-1] over x < i and a prefix maximum data structure that at index i will have DP[x][j] at position V[x] for x < i, and 0 at every other position.

    We have DP[i][j] = 1 + max(DP[i'][j], DP[x][j-1]) where we go over i', x < i, V[i'] < V[i]. The prefix maximum of DP[x][j-1] gives us the maximum of terms of the second type, and querying the prefix maximum data structure for prefix [0, V[i]] gives us the maximum of terms of the first type. Then update the prefix maximum and prefix maximum data structure.

    Here is a C++ implementation of the algorithm. Note that this implementation does not assume that the last element of the array is its maximum, or that the array contains no duplicates.

    
    #include <iostream>
    #include <vector>
    #include <algorithm>
    using namespace std;
    
    // Fenwick tree for prefix maximum queries
    class Fenwick {
        private:
            vector<int> val;
        public:
            Fenwick(int n) : val(n+1, 0) {}
    
            // Sets value at position i to maximum of its current value and 
            void inc(int i, int v) {
                for (++i; i < val.size(); i += i & -i) val[i] = max(val[i], v);
            }
    
            // Calculates prefix maximum up to index i
            int get(int i) {
                int res = 0;
                for (++i; i > 0; i -= i & -i) res = max(res, val[i]);
                return res;
            }
    };
    
    // Binary searches index of v from sorted vector
    int bins(const vector<int>& vec, int v) {
        int low = 0;
        int high = (int)vec.size() - 1;
        while(low != high) {
            int mid = (low + high) / 2;
            if (vec[mid] < v) low = mid + 1;
            else high = mid;
        }
        return low;
    }
    
    // Compresses the range of values to [0, m), and returns m
    int compress(vector<int>& vec) {
        vector<int> ord = vec;
        sort(ord.begin(), ord.end());
        ord.erase(unique(ord.begin(), ord.end()), ord.end());
        for (int& v : vec) v = bins(ord, v);
        return ord.size();
    }
    
    // Returns length of longest strictly increasing subsequence with at most k exceptions
    int lisExc(int k, vector<int> vec) {
        int n = vec.size();
        int m = compress(vec);
        vector<int> dp(n, 0);
        for (int j = 0;; ++j) {
            Fenwick fenw(m+1); // longest subsequence with at most j exceptions ending at this value
            int max_exc = 0; // longest subsequence with at most j-1 exceptions ending before this
            for (int i = 0; i < n; ++i) {
                int off = 1 + max(max_exc, fenw.get(vec[i]));
                max_exc = max(max_exc, dp[i]);
    
                dp[i] = off;
                fenw.inc(vec[i]+1, off);
            }
            if (j == k) return fenw.get(m);
        }
    }
    
    int main() {
        int n, k;
        cin >> n >> k;
    
        vector<int> vec(n);
        for (int i = 0; i < n; ++i) cin >> vec[i];
    
        int res = lisExc(k, vec);
        cout << res << '\n';
    }
    
    

    Now we will return to the O(n log² n) algorithm. Select some integer 0 <= r <= n. Define DP'[a][r] = max(DP[a][b] - rb), where the maximum is taken over b, MAXB[a][r] as the maximum b such that DP'[a][r] = DP[a][b] - rb, and MINB[a][r] similarly as the minimum such b. We will show that DP[a][k] = DP'[a][r] + rk if and only if MINB[a][r] <= k <= MAXB[a][r]. Further, we will show that for any k exists an r for which this inequality holds.

    Note that MINB[a][r] >= MINB[a][r'] and MAXB[a][r] >= MAXB[a][r'] if r < r', hence if we assume the two claimed results, we can do binary search for the r, trying O(log n) values. Hence we achieve complexity O(n log² n) if we can calculate DP', MINB and MAXB in O(n log n) time.

    To do this, we will need a segment tree that stores tuples P[i] = (v_i, low_i, high_i), and supports the following operations:

    1. Given a range [a, b], find the maximum value in that range (maximum v_i, a <= i <= b), and the minimum low and maximum high paired with that value in the range.

    2. Set the value of the tuple P[i]

    This is easy to implement with complexity O(log n) time per operation assuming some familiarity with segment trees. You can refer to the implementation of the algorithm below for details.

    We will now show how to compute DP', MINB and MAXB in O(n log n). Fix r. Build the segment tree initially containing n+1 null values (-INF, INF, -INF). We maintain that P[V[j]] = (DP'[j], MINB[j], MAXB[j]) for j less than the current position i. Set DP'[0] = 0, MINB[0] = 0 and MAXB[0] to 0 if r > 0, otherwise to INF and P[0] = (DP'[0], MINB[0], MAXB[0]).

    Loop i from 1 to n. There are two types of subsequences ending at i: those where the previous element is greater than V[i], and those where it is less than V[i]. To account for the second kind, query the segment tree in the range [0, V[i]]. Let the result be (v_1, low_1, high_1). Set off1 = (v_1 + 1, low_1, high_1). For the first kind, query the segment tree in the range [V[i], n]. Let the result be (v_2, low_2, high_2). Set off2 = (v_2 + 1 - r, low_2 + 1, high_2 + 1), where we incur the penalty of r for creating an exception.

    Then we combine off1 and off2 into off. If off1.v > off2.v set off = off1, and if off2.v > off1.v set off = off2. Otherwise, set off = (off1.v, min(off1.low, off2.low), max(off1.high, off2.high)). Then set DP'[i] = off.v, MINB[i] = off.low, MAXB[i] = off.high and P[i] = off.

    Since we make two segment tree queries at every i, this takes O(n log n) time in total. It is easy to prove by induction that we compute the correct values DP', MINB and MAXB.

    So in short, the algorithm is:

    1. Preprocess, modifying values so that they form a permutation, and the last value is the largest value.

    2. Binary search for the correct r, with initial bounds 0 <= r <= n

    3. Initialise the segment tree with null values, set DP'[0], MINB[0] and MAXB[0].

    4. Loop from i = 1 to n, at step i

      • Querying ranges [0, V[i]] and [V[i], n] of the segment tree,
      • calculating DP'[i], MINB[i] and MAXB[i] based on those queries, and
      • setting the value at position V[i] in the segment tree to the tuple (DP'[i], MINB[i], MAXB[i]).
    5. If MINB[n][r] <= k <= MAXB[n][r], return DP'[n][r] + kr - 1.

    6. Otherwise, if MAXB[n][r] < k, the correct r is less than the current r. If MINB[n][r] > k, the correct r is greater than the current r. Update the bounds on r and return to step 1.

    Here is a C++ implementation for this algorithm. It also finds the optimal subsequence.

        #include <iostream>
        #include <vector>
        #include <algorithm>
        using namespace std;
        using ll = long long;
        const int INF = 2 * (int)1e9;
    
        pair<ll, pair<int, int>> combine(pair<ll, pair<int, int>> le, pair<ll, pair<int, int>> ri) {
            if (le.first < ri.first) swap(le, ri);
            if (ri.first == le.first) {
                le.second.first = min(le.second.first, ri.second.first);
                le.second.second = max(le.second.second, ri.second.second);
            }
            return le;
        }
    
        // Specialised range maximum segment tree
        class SegTree {
            private:
                vector<pair<ll, pair<int, int>>> seg;
                int h = 1;
    
                pair<ll, pair<int, int>> recGet(int a, int b, int i, int le, int ri) const {
                    if (ri <= a || b <= le) return {-INF, {INF, -INF}};
                    else if (a <= le && ri <= b) return seg[i];
                    else return combine(recGet(a, b, 2*i, le, (le+ri)/2), recGet(a, b, 2*i+1, (le+ri)/2, ri));
                }
            public:
                SegTree(int n) {
                    while(h < n) h *= 2;
                    seg.resize(2*h, {-INF, {INF, -INF}});
                }
                void set(int i, pair<ll, pair<int, int>> off) {
                    seg[i+h] = combine(seg[i+h], off);
                    for (i += h; i > 1; i /= 2) seg[i/2] = combine(seg[i], seg[i^1]);
                }
                pair<ll, pair<int, int>> get(int a, int b) const {
                    return recGet(a, b+1, 1, 0, h);
                }
        };
    
        // Binary searches index of v from sorted vector
        int bins(const vector<int>& vec, int v) {
            int low = 0;
            int high = (int)vec.size() - 1;
            while(low != high) {
                int mid = (low + high) / 2;
                if (vec[mid] < v) low = mid + 1;
                else high = mid;
            }
            return low;
        }
    
        // Finds longest strictly increasing subsequence with at most k exceptions in O(n log^2 n)
        vector<int> lisExc(int k, vector<int> vec) {
            // Compress values
            vector<int> ord = vec;
            sort(ord.begin(), ord.end());
            ord.erase(unique(ord.begin(), ord.end()), ord.end());
            for (auto& v : vec) v = bins(ord, v) + 1;
    
            // Binary search lambda
            int n = vec.size();
            int m = ord.size() + 1;
            int lambda_0 = 0;
            int lambda_1 = n;
            while(true) {
                int lambda = (lambda_0 + lambda_1) / 2;
                SegTree seg(m);
                if (lambda > 0) seg.set(0, {0, {0, 0}});
                else seg.set(0, {0, {0, INF}});
    
                // Calculate DP
                vector<pair<ll, pair<int, int>>> dp(n);
                for (int i = 0; i < n; ++i) {
                    auto off0 = seg.get(0, vec[i]-1); // previous < this
                    off0.first += 1;
    
                    auto off1 = seg.get(vec[i], m-1); // previous >= this
                    off1.first += 1 - lambda;
                    off1.second.first += 1;
                    off1.second.second += 1;
    
                    dp[i] = combine(off0, off1);
                    seg.set(vec[i], dp[i]);
                }
    
                // Is min_b <= k <= max_b?
                auto off = seg.get(0, m-1);
                if (off.second.second < k) {
                    lambda_1 = lambda - 1;
                } else if (off.second.first > k) {
                    lambda_0 = lambda + 1;
                } else {
                    // Construct solution
                    ll r = off.first + 1;
                    int v = m;
                    int b = k;
                    vector<int> res;
                    for (int i = n-1; i >= 0; --i) {
                        if (vec[i] < v) {
                            if (r == dp[i].first + 1 && dp[i].second.first <= b && b <= dp[i].second.second) {
                                res.push_back(i);
                                r -= 1;
                                v = vec[i];
                            }
                        } else {
                            if (r == dp[i].first + 1 - lambda && dp[i].second.first <= b-1 && b-1 <= dp[i].second.second) {
                                res.push_back(i);
                                r -= 1 - lambda;
                                v = vec[i];
                                --b;
                            }
                        }
                    }
                    reverse(res.begin(), res.end());
                    return res;
                }
            }
        }
    
        int main() {
            int n, k;
            cin >> n >> k;
    
            vector<int> vec(n);
            for (int i = 0; i < n; ++i) cin >> vec[i];
    
            vector<int> ans = lisExc(k, vec);
            for (auto i : ans) cout << i+1 << ' ';
            cout << '\n';
        }
    

    We will now prove the two claims. We wish to prove that

    1. DP'[a][r] = DP[a][b] - rb if and only if MINB[a][r] <= b <= MAXB[a][r]

    2. For all a, k there exists an integer r, 0 <= r <= n, such that MINB[a][r] <= k <= MAXB[a][r]

    Both of these follow from the concavity of the problem. Concavity means that DP[a][k+2] - DP[a][k+1] <= DP[a][k+1] - DP[a][k] for all a, k. This is intuitive: the more exceptions we are allowed to make, the less allowing one more helps us.

    Fix a and r. Set f(b) = DP[a][b] - rb, and d(b) = f(b+1) - f(b). We have d(k+1) <= d(k) from the concavity of the problem. Assume x < y and f(x) = f(y) >= f(i) for all i. Hence d(x) <= 0, thus d(i) <= 0 for i in [x, y). But f(y) = f(x) + d(x) + d(x + 1) + ... + d(y - 1), hence d(i) = 0 for i in [x, y). Hence f(y) = f(x) = f(i) for i in [x, y]. This proves the first claim.

    To prove the second, set r = DP[a][k+1] - DP[a][k] and define f, d as previously. Then d(k) = 0, hence d(i) >= 0 for i < k and d(i) <= 0 for i > k, hence f(k) is maximal as desired.

    Proving concavity is more difficult. For a proof, see my answer at cs.stackexchange.