I am looking into this problem:
Let's assume we have N segments on the X-axis with different start and end points. This is described with the next data structure:
[(start1, end1),..., (startN, endN)]
Now we want to calculate the whole size of such segments, but overlaps should be not be double counted.
input: [(0,3),(1,2), (6,7)]
output: 4
Because segment (1,2) overlaps with segment (0,3), the distance between 0 and 3 is just 3, and then between 6 and 7 it is 1. So 3+1 = 4.
Is there an algorithm to make this calculation for large input sizes?
You can proceed as follows:
Extract both coordinates of each pair and mark whether they end a segment (flag) or not (when they start one). Sort these new pairs (x, end). Then process them. When the coordinate starts a segment, push the coordinate on a stack. When it ends a segment, pop the corresponding start coordinate from the stack. If the stack is empty add the difference between end and start coordinate to the total:
lst = [(0,3),(1,2), (6,7)]
stack = []
total = 0
for x, end in sorted((x, i) for pair in lst for i, x in enumerate(pair)):
if end:
start = stack.pop()
if not len(stack):
total += x - start
else:
stack.append(x)
print(total) # 4