Search code examples
rsubsetrowsoverlapoverlapping

Collapse rows with overlapping ranges


I have a data.frame with start and end time:

ranges<- data.frame(start = c(65.72000,65.72187, 65.94312,73.75625,89.61625),stop = c(79.72187,79.72375,79.94312,87.75625,104.94062))

> ranges
     start      stop
1 65.72000  79.72187
2 65.72187  79.72375
3 65.94312  79.94312
4 73.75625  87.75625
5 89.61625 104.94062

In this example, the ranges in row 2 and 3 are entirely within the range between 'start' on row 1 and stop on row 4. Thus, the overlapping ranges 1-4 should be collapsed to one range:

> ranges
     start      stop
1 65.72000  87.75625
5 89.61625 104.94062

I tried this:

mdat <- outer(ranges$start, ranges$stop, function(x,y) y > x)
mdat[upper.tri(mdat)|col(mdat)==row(mdat)] <- NA
mdat

And now I just need to figure out how to combine all the true ones, but not sure if it's the best way to go


Solution

  • You can try this:

    library(dplyr)
    ranges %>% 
           arrange(start) %>% 
           group_by(g = cumsum(cummax(lag(stop, default = first(stop))) < start)) %>% 
           summarise(start = first(start), stop = max(stop))
    
    # A tibble: 2 × 3
    #      g    start      stop
    #  <int>    <dbl>     <dbl>
    #1     0 65.72000  87.75625
    #2     1 89.61625 104.94062