Given a set of intervals S what is the efficient way to find the number of subsets assigned to each interval from the set S.
say for example
S = (11,75), (14,62), (17,32), (24,48), (31,71), (34,74), (40,97), (41,58)
as for output
(11, 75) => 6 -> (14,62), (17,32), (24,48), (31,71), (34,74), (41,58)
(14, 62) => 3 -> (17,32), (24,48), (41,58)
(17, 32) => 0
(24, 48) => 0
(31, 71) => 1 -> (41,58)
(34, 74) => 1 -> (41,58)
(40, 97) => 1 -> (41,58)
(41, 58) => 0
Is it possible to get this mapping in o(nlogn) or substantially less than o(n2)?
There seems to be an O(n*log(n)) way to do this. The intuition is that we need some sort of way to organize the intervals in a way where, at the current step, all intervals that the current one could contain have been considered.
The algorithm is as follows:
Walking through the example, we first find that
sortedIntervals = [(17,32), (24,48), (41,58), (14,62), (31,71), (34,74), (11,75),(40,97)]
And let our sorted set of intervals (sorting now by start time) be
examinedIntervals = []
Now let's step through sortedIntervals
(17,32)
. examinedIntervals
is empty, so it doesn't contain anything.(24, 48)
. examinedIntervals = [(17, 32)]
. Because there are no intervals that start after 24, we have that (24, 48)
contains 0 intervals.(41, 58)
. examinedIntervals = [(17, 32), (24, 48)]
. No intervals have a start time after 41, so (41, 58)
contains 0 intervals(14, 62)
. examinedIntervals = [(17, 32), (24, 48), (41, 58)]
. Now all three intervals have a start time after 14, so (14, 62)
contains 3 intervals(31, 71)
. examinedIntervals = [(14, 62), (17, 32), (24, 48), (41, 58)]
. Only one interval comes after 31, so (31, 71)
contains 1 interval(34, 74)
. examinedIntervals = [(14, 62), (17, 32), (24, 48), (31, 71), (41, 58)]
. One interval comes after 34, so (34, 74)
contains 1 interval(11, 75)
. examinedIntervals = [(14, 62), (17, 32), (24, 48), (31, 71), (34, 74), (41, 58)]
, and all 6 intervals have a start time after 14.(40, 97)
. examinedIntervals = [(11, 75), (14, 62), (17, 32), (24, 48), (31, 71), (34, 74), (41, 58)]
. Only one interval comes after 40, so (40, 97)
contains 1 interval.Summarizing we do indeed get the correct results:
(40, 97) -> 1
(11, 75) -> 6
(34, 74) -> 1
(31, 71) -> 1
(14, 62) -> 3
(41, 58) -> 0
(24, 48) -> 0
(17, 32) -> 0
It can also be verified easily that runtime is O(n*log(n)), assuming the use of an efficient sort and a balanced tree in the second part. The initial sort runs in the given amount of time. The second portion involves n insertions into a binary tree of height O(log(n)), giving a runtime of O(nlog(n)). Because we're summing two steps that run in O(nlog(n)), the overall runtime is O(nlog(n)).