Search code examples
c++algorithmtime-complexityheapavl-tree

Find shortest subarrays A[0:L], B[0:L] where M different elements in A are bigger than M different elements in B (time complexity)


I need to find minimum subarray of size L, A[0:L], B[0:L] such that there are M different elements in A that are bigger than M different elements in B. Like A[i] > B[j] counts but I cannot use A[i] or B[j] again. Also subarrays must start from the beginning of array.

The homework is about AVLTrees and Heap so I guess a solution would involve one of them (For the example below, I used the priority_queue from stl but actually I am not allowed to use any container from any library so I already have min-heap implementation but for easy-understanding I used the stl equivalent). Also I am expected to solve the question using AVL Tree or Heap.

Ps. I have found out that arrays can contain duplicate elements.

The size of A and B are the same which is: N.

I need to find a solution that has time complexity O(N * logN * logM)

#include <iostream>
#include <queue>
#include <span>

int minLenOfSubarrays(std::span<int> aArr, std::span<int> bArr, int M)
{
    const int N = aArr.size();
    int count = 0;
    int L = 0;

    int left = M;
    int right = N;
    while(left <= right){ //log N
        //Heap a; //min-heap
        //Heap b; //min-heap
        int mid = left+(right-left)/2;
        std::priority_queue<int,std::vector<int>,std::greater<int>> a;
        std::priority_queue<int,std::vector<int>,std::greater<int>> b;
        count = 0;
        for(int j = 0; j < mid; j++){ //N log N
            a.push(aArr[j]);
            b.push(bArr[j]);
        }
        while(!a.empty()){ // N log N
            if(a.top() > b.top()){
                count++;
                a.pop();
                b.pop();
            }
            else{
                a.pop();
            }
        }
        if(count >= M){
            L = mid;
            right = mid-1;
        }
        else{
            left = mid+1;
        }
    }
    return L;
}

int main(int argc, char* argv[]){

    int aArr[] = {2,4,10,6,1,11};
    int bArr[] = {3,5,8,9,7,12};
    int M = 3;
    
    std::cout << minLenOfSubbarays(aArr, bArr, M) << '\n';
}

I have tried a solution using min-heaps and binary search but the complexity that I could reach is max O(N*logN*logN).


Solution

  • In your binary search, for each mid do this:

    1. Compute the M largest values in A[0:mid], in ascending order. Can be done in O(mid * logM) with an AVL tree or a heap of size M. Which is also O(N * logM).
    2. Compute the M smallest values in B[0:mid], in ascending order. Again O(N * logM).
    3. Go through them in parallel, checking whether each A-value is larger than the B-value. Takes O(M), which is also O(N).

    The binary search contributes the remaining factor logN.

    Bonus: You can make it O(N * logN) by sorting all A-values and all B-values before the binary search and then for each mid you get the above A-values and B-values of steps 1 and 2 by instead filtering those already sorted values. In "executable pseudocode":

    from bisect import bisect
    
    A = 2,4,10,6,1,11
    B = 3,5,8,9,7,12
    M = 3
    
    # Sort indices by their values
    N = len(A)
    Ai = sorted(range(N), key=A.__getitem__)
    Bi = sorted(range(N), key=B.__getitem__)
    
    # Check whether A[:L] and B[:L] have the desired property 
    def enough(L):
        largeA = [A[i] for i in Ai if i < L][-M:]
        smallB = [B[i] for i in Bi if i < L][:M]
        return all(a > b for a, b in zip(largeA, smallB))
    
    # Binary search, find the smallest L from M to N that's enough
    L = bisect(range(N), False, M, N, key=enough)
    print(L)
    

    (This assumes there always is a valid L, otherwise you would've specified what to do if not.)

    Attempt This Online!

    I think we can even achieve O(NlogM + MlogN). Still do the binary search, but don't start from scratch for each new mid. Instead just update a tree of the M largest A-values:

    • If mid is larger than previously, insert each new value and remove the smallest after each insertion. Similar for B.
    • If mid is smaller than previously, then undo the insertions/removals you did to get to the larger mid. So this requires keeping track of what updates you did, at least which elements were removed. It's clear which elements got inserted.

    Overall you have O(N) single-value updates, each costing O(logM). And you have O(logN) binary search rounds, each costing O(M) for comparing the M pairs. (This might be inspired by Costantino's answer.)