Search code examples
algorithmintervalsgreedycover

Max coverage disjoint intervals


Assume you have k<=10^5 intervals [a_i, b_i] \in [1,10^18] (some of them may overlap), and you need to choose a set of intervals mutually disjoint such that their union is maximal. Not maximum number of disjoint intervals, but the union must cover the most.

Can't try all possible subsets 2^k infeasible. Greedy approaches ordering by a_i ( interval covering algorithm) and ordering by b_i ( maximum number of disjoint intervals algorithm ) didn't work Can't figure out if there is a dynamic program solution. Given the size of the input, I think the solution should be O(k log k) or O(k)

Examples 1. [1,4], [3,5], [5,9], [7, 18] Sol [3,5]u[7,18]

  1. [1,2], [2,6], [3,4], [5,7] Sol [1,2]u[3,4]u[5,7]

  2. [2,30], [25,39], [30,40] Sol [2,30]


Solution

  • The problem can be solved in O(k log(k)).

    First sort the intervals by their upper bounds (the b_is). Let I(1), I(2), ..., I(k) be the list of sorted intervals. That is,

    b_1 <= b_2 <= ... <= b_k
    

    Denote by w(i) the length of interval I(i). That is,

    w(i) = b_i - a_i
    

    Denote by f(i) the total length of the optimal solution among those whose last interval is I(i). That is, the solution corresponding to f(i) is a set which:

    1. contains the interval I(i)
    2. doesn't contain any interval whose upper bound is above b_i
    3. has the maximum cover among the sets of (non-overlapping) intervals satisfying 1+2

    Now we are going to compute f(1), f(2), ..., f(k) and return the maximum value of them all. Clearly, the optimal solution corresponds to one of the f(i)s and therefore the maximal f(i) is the optimal solution.

    To compute each f(i) we use dynamic programming. We do this by relying on the following recurrence relation:

    f(i) = w(i) + max{f(j) | b_j < a_i}
    

    I'll demonstrate the computation with your first input example:

    I(1)=[1, 4],  w(1)=3
    I(2)=[3, 5],  w(2)=2
    I(3)=[5, 9],  w(3)=4
    I(4)=[7, 18], w(4)=11
    

    We compute f(i) for i=1, 2, 3, 4:

    f(1) = w(1) + max{None} = 3 
        f(1) intervals: {I(1)}
    
    f(2) = w(2) + max{None} = 2 
        f(2) intervals: {I(2)}
    
    f(3) = w(3) + max{f(1)} = 4 + 1 = 5 
        f(3) intervals = {I(1), I(3)}
    
    f(4) = w(4) + max{f(1), f(2)} = 11 + f(1) = 11 + 3 = 14 
        f(4) intervals = {I(1), I(4)}
    

    The maximum f(i) is f(4) which corresponds to the set of intervals {I(1), I(4)}, the optimal solution.