Search code examples
algorithmdata-structuresdivide-and-conquer

Maximum number of subsets of overlapping intervals


Given a set of intervals S what is the efficient way to find the number of subsets assigned to each interval from the set S.

say for example

S = (11,75), (14,62), (17,32), (24,48), (31,71), (34,74), (40,97), (41,58)

as for output

(11, 75) => 6 -> (14,62), (17,32), (24,48), (31,71), (34,74), (41,58)  
(14, 62) => 3 -> (17,32), (24,48), (41,58)   
(17, 32) => 0  
(24, 48) => 0  
(31, 71) => 1 -> (41,58)   
(34, 74) => 1 -> (41,58)  
(40, 97) => 1 -> (41,58)  
(41, 58) => 0

Is it possible to get this mapping in o(nlogn) or substantially less than o(n2)?


Solution

  • There seems to be an O(n*log(n)) way to do this. The intuition is that we need some sort of way to organize the intervals in a way where, at the current step, all intervals that the current one could contain have been considered.

    The algorithm is as follows:

    1. Sort the intervals by end times in ascending order, and sort tied end times by their start times in descending order.
    2. Iterate over the intervals and maintain a sorted set of all start times seen. The idea is that, when looking at the current interval, all intervals that it could contain have already been examined, and the number of intervals that the current one does contain is simply the number of elements in our built set that have a start time later than the current one.

    Walking through the example, we first find that

    sortedIntervals = [(17,32), (24,48), (41,58), (14,62), (31,71), (34,74), (11,75),(40,97)]

    And let our sorted set of intervals (sorting now by start time) be

    examinedIntervals = []

    Now let's step through sortedIntervals

    1. Consider (17,32). examinedIntervals is empty, so it doesn't contain anything.
    2. Consider (24, 48). examinedIntervals = [(17, 32)] . Because there are no intervals that start after 24, we have that (24, 48) contains 0 intervals.
    3. Consider (41, 58). examinedIntervals = [(17, 32), (24, 48)]. No intervals have a start time after 41, so (41, 58) contains 0 intervals
    4. Consider (14, 62). examinedIntervals = [(17, 32), (24, 48), (41, 58)]. Now all three intervals have a start time after 14, so (14, 62) contains 3 intervals
    5. Consider (31, 71). examinedIntervals = [(14, 62), (17, 32), (24, 48), (41, 58)]. Only one interval comes after 31, so (31, 71) contains 1 interval
    6. Consider (34, 74). examinedIntervals = [(14, 62), (17, 32), (24, 48), (31, 71), (41, 58)]. One interval comes after 34, so (34, 74) contains 1 interval
    7. Consider (11, 75). examinedIntervals = [(14, 62), (17, 32), (24, 48), (31, 71), (34, 74), (41, 58)], and all 6 intervals have a start time after 14.
    8. Consider (40, 97). examinedIntervals = [(11, 75), (14, 62), (17, 32), (24, 48), (31, 71), (34, 74), (41, 58)]. Only one interval comes after 40, so (40, 97) contains 1 interval.

    Summarizing we do indeed get the correct results:

    (40, 97) -> 1
    (11, 75) -> 6
    (34, 74) -> 1
    (31, 71) -> 1
    (14, 62) -> 3
    (41, 58) -> 0
    (24, 48) -> 0
    (17, 32) -> 0
    

    It can also be verified easily that runtime is O(n*log(n)), assuming the use of an efficient sort and a balanced tree in the second part. The initial sort runs in the given amount of time. The second portion involves n insertions into a binary tree of height O(log(n)), giving a runtime of O(nlog(n)). Because we're summing two steps that run in O(nlog(n)), the overall runtime is O(nlog(n)).