Search code examples
roverlap

Find value of overlapping ranges of integers in R that are NOT times or genomes


I'm trying to calculate overlapping depth ranges for marine species and human activities. So for each species, there's a min and max depth it occurs at, and I want to efficiently calculate the depth range the overlaps with the depth range of 4 different activities. I think this can be done with data.table::foverlaps() or IRanges::findOverlaps(), but I can't figure out how to calculate the value of the overlap, not just whether it's true or false. So if species D is found between 40-100m depth, and activity 1 occurs at 0-50m depth, the overlap is 10m.

For example,

min_1 <- 0 
max_1 <- 50
min_2 <- 0 
max_2 <- 70
min_3 <- 0
max_3 <- 200
min_4 <- 0
max_4 <- 500

activities <- data.frame(min_1, max_1, min_2, max_2, min_3, max_3, min_4, max_4)

spp_id <- c("a", "b", "c", "d")
spp_depth_min <- c(0, 20, 30, 40)
spp_depth_max <- c(200, 500, 50, 100)

species <- data.frame(spp_id, spp_depth_min, spp_depth_max)

## data.table approach?

setDT(activities)
setDT(species)

foverlaps(species, activities, ...) ## Or do I need to subset each activity and do separate calculations? 

Would it be easier to write a function? I'm really unfamiliar with that! This seems like it should be a common/easy thing to do, I don't know why it's confusing me so much


Solution

  • I restructured your activities table into a long form so you can do all 4 calculations at once. Then the overlaps join is done, then you can calculate the overlap length from the results.

    activities <- data.table(
      act = c('act_1','act_2','act_3','act_4'),
      a_min = c(min_1, min_2, min_3, min_4),
      a_max = c(max_1, max_2, max_3, max_4)
      )
    
    spp_id <- c("a", "b", "c", "d")
    spp_depth_min <- c(0, 20, 30, 40)
    spp_depth_max <- c(200, 500, 50, 100)
    
    species <- data.table(spp_id, spp_depth_min, spp_depth_max)
    
    setkey(activities,a_min,a_max)
    
    ol <- foverlaps(species, activities, 
      by.x = c('spp_depth_min','spp_depth_max'), 
      by.y = c('a_min','a_max')
      )
    ol[,ol_length := pmin(spp_depth_max,a_max)-pmax(spp_depth_min,a_min)]
    ol