Search code examples
python-3.xalgorithmperformanceoptimizationintervals

Deciding if all intervals are overlapping


I'm doing a problem that n people is standing on a line and each person knows their own position and speed. I'm asked to find the minimal time to have all people go to any spot.

Basically what I'm doing is finding the minimal time using binary search and have every ith person's furthest distance to go in that time in intervals. If all intervals overlap, there is a spot that everyone can go to.

I have a solution to this question but the time limit exceeded for it for my bad solution to find the intervals. My current solution runs too slow and I'm hoping to get a better solution.

my code:

    people = int(input())
    peoplel = [list(map(int, input().split())) for _ in range(people)] # first item in people[i] is the position of each person, the second item is the speed of each person
    def good(time):
        return checkoverlap([[i[0] - time *i[1], i[0] + time * i[1]] for i in peoplel])
        # first item,second item = the range of distance a person can go to 
 

    def checkoverlap(l):
        for i in range(len(l) - 1):
            seg1 = l[i]
            for i1 in range(i + 1, len(l)):
                seg2 = l[i1]
                if seg2[0] <= seg1[0] <= seg2[1] or seg1[0] <= seg2[0] <= seg1[1]:
                    continue
                elif seg2[0] <= seg1[1] <= seg2[1] or seg1[0] <= seg2[1] <= seg1[1]:
                    continue
                return False
        return True

(this is my first time asking a question so please inform me about anything that is wrong)


Solution

  • One does simply go linear

    A while after I finished the answer I found a simplification that removes the need for sorting and thus allows us to further reduce the complexity of finding if all the intervals are overlapping to O(N).

    If we look at the steps that are being done after the initial sort we can see that we are basically checking

    if max(lower_bounds) < min(upper_bounds):
        return True
    else:
        return False
    

    And since both min and max are linear without the need for sorting, we can simplify the algorithm by:

    • Creating an array of lower bounds - one pass.
    • Creating an array of upper bounds - one pass.
    • Doing the comparison I mentioned above - two passes over the new arrays.

    All this could be done together in one one pass to further optimize(and to prevent some unnecessary memory allocation), however this is clearer for the explanation's purpose.

    Since the reasoning about the correctness and timing was done in the previous iteration, I will skip it this time and keep the section below since it nicely shows the thought process behind the optimization.

    One sort to rule them all

    Disclaimer: This section was obsoleted time-wise by the one above. However since it in fact allowed me to figure out the linear solution, I'm keeping it here.

    As the title says, sorting is a rather straightforward way of going about this. It will require a little different data structure - instead of holding every interval as (min, max) I opted for holding every interval as (min, index), (max, index). This allows me to sort these by the min and max values. What follows is a single linear pass over the sorted array. We also create a helper array of False values. These represent the fact that at the beginning all the intervals are closed.
    Now comes the pass over the array:

    • Since the array is sorted, we first encounter the min of each interval. In such case, we increase the openInterval counter and a True value of the interval itself. Interval is now open - until we close the interval, the person can arrive at the party - we are within his(or her) range.
    • We go along the array. As long as we are opening the intervals, everything is ok and if we manage to open all the intervals, we have our party destination where all the social distancing collapses. If this happens, we return True.
    • If we close any of the intervals, we have found our party breaker who can't make it anymore. (Or we can discuss that the party breakers are those who didn't bother to arrive yet when someone has to go already). We return False.

    The resulting complexity is O(Nlog(N)) caused by the initial sort since the pass itself is linear in nature. This is quite a bit better than the original O(n^2) caused by the "check all intervals pairwise" approach.

    The code:

    import numpy as np
    import cProfile, pstats, io
    
    #random data for a speed test. Not that useful for checking the correctness though.
    testSize = 10000
    x = np.random.randint(0, 10000, testSize)
    y = np.random.randint(1, 100, testSize)
    peopleTest = [x for x in zip(x, y)]
    
    #Just a basic example to help with the reasoning about the correctness
    peoplel = [(1, 2), (3, 1), (8, 1)]
    # first item in people[i] is the position of each person, the second item is the speed of each person
    
    
    def checkIntervals(people, time):
        a = [(x[0] - x[1] * time, idx) for idx, x in enumerate(people)]
        b = [(x[0] + x[1] * time, idx) for idx, x in enumerate(people)]
        checks = [False for x in range(len(people))]
        openCount = 0
        intervals = [x for x in sorted(a + b, key=lambda x: x[0])]
        for i in intervals:
            if not checks[i[1]]:
                checks[i[1]] = True
                openCount += 1
                if openCount == len(people):
                    return True
            else:
                return False
    
        print(intervals)
    
    
    
    def good(time, people):
        return checkoverlap([[i[0] - time * i[1], i[0] + time * i[1]] for i in people])
        # first item,second item = the range of distance a person can go to
    
    
    def checkoverlap(l):
        for i in range(len(l) - 1):
            seg1 = l[i]
            for i1 in range(i + 1, len(l)):
                seg2 = l[i1]
                if seg2[0] <= seg1[0] <= seg2[1] or seg1[0] <= seg2[0] <= seg1[1]:
                    continue
                elif seg2[0] <= seg1[1] <= seg2[1] or seg1[0] <= seg2[1] <= seg1[1]:
                    continue
                return False
        return True
    
    
    pr = cProfile.Profile()
    pr.enable()
    
    print(checkIntervals(peopleTest, 10000))
    
    print(good(10000, peopleTest))
    
    pr.disable()
    s = io.StringIO()
    sortby = "cumulative"
    ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
    ps.print_stats()
    print(s.getvalue())
    

    The profiling stats for the pass over test array with 10K random values:

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    8.933    8.933 (good)
        1    8.925    8.925    8.926    8.926 (checkoverlap)
        1    0.003    0.003    0.023    0.023 (checkIntervals)
        1    0.008    0.008    0.010    0.010 {built-in method builtins.sorted}