Combination of Non-overlapping interval PAIRS

I recently did a coding challenge where I was tasked to return the number of unique interval pairs that do not overlap when given the starting points in one list and the ending points in one list. I was able to come up with an n^2 solution and eliminated duplicates by using a set to hash each entry tuple of (start, end). I was wondering if there was a more efficient way of approaching this, or this is the best I could do:

def paperCuttings(starting, ending):
    # Pair each start with its corresponding end and sort
    intervals = sorted(zip(starting, ending), key=lambda x: x[1])
    non_overlaps = set()

    print(intervals)

    # Store valid combinations
    for i in range(len(intervals)):
        for j in range(i+1, len(intervals)):
            # If the ending of the first is less than the starting of the second, they do not overlap
            if intervals[i][1] < intervals[j][0]:                
                non_overlaps.add((intervals[i], intervals[j]))

    return len(non_overlaps)

starting = [1,1,6,7]
ending = [5,3,8,10]
print(paperCuttings(starting, ending)) # should return 4

starting2 = [3,1,2,8,8]
ending2 = [5, 3, 7, 10, 10]
print(paperCuttings(starting2, ending2))  # should return 3

I ask because I timed out in some hidden test cases

Solution

This is a O(n*log n) solution in Ruby (n being the number of intervals). I will include a detailed explanation that should make conversion of the code to Python straightforward.

I assume that non-overlapping intervals have no points in common, not even endpoints¹.

def paperCuttings(starting, ending)
  # Compute an array of unique intervals sorted by the beginning
  # of each interval
  intervals = starting.zip(ending).uniq.sort

  n = intervals.size
  count = 0

  # Loop over the indices of all but the last interval.
  # The interval at index i of intervals will be referred to
  # below as "interval i" 
  (0..n-2).each do |i|
    # intervals[i] is interval i, an array containing its two
    # endpoints. Extract the second endpoint to the variable endpoint
    _, endpoint = intervals[i]
     
    # Employ a binary search to find the index of the first
    # interval j > i for which intervals[j].first > endpoint,
    # where intervals[j].first is the beginning of interval j
    k = (i+1..n-1).bsearch { |j| intervals[j].first > endpoint }

    # k equals nil if no such interval is found, in which case
    # continue the loop the next interval i
    next if k == nil

    # As intervals i and k are non-overlapping, interval i is
    # non-overlapping with all intervals l, k <=l<= n-1, of which
    # there are n-k, so add n-k to count
    count = count + n - k
  end
  # return count
  count
end

Try it.

starting = [1, 1, 6,  7]
ending   = [5, 3, 8, 10]

paperCuttings(starting, ending)
  #=> 4

starting = [3, 1, 2,  8,  8]
ending   = [5, 3, 7, 10, 10]
paperCuttings(starting, ending)
  #=> 3

Here I will explain the calculation

intervals = starting.zip(ending).uniq.sort

for

starting = [3, 1, 2,  8,  8]
ending   = [5, 3, 7, 10, 10]

a = starting.zip(ending)
  #=> [[3, 5], [1, 3], [2, 7], [8, 10], [8, 10]]
b = a.uniq
  #=> [[3, 5], [1, 3], [2, 7], [8, 10]]
b.sort
  #=> [[1, 3], [2, 7], [3, 5], [8, 10]]

The removal of duplicates is required by the statement of the problem.

The elements of b are sorted by their first elements. Had there been two arrays with the same first element the second element would be used as the tie-breaker, though that's not important here.

The documentation for Ruby's binary search method (over a range) is here. Binary searches have a time complexity of O(log n), which accounts for the log term in the overall time complexity of O(n*log n).

^{1. If intervals that share only a single endpoint are regarded as non-overlapping, change starting[j] >= endpoint to starting[j] > endpoint.}