Search code examples
rlistrangebioinformaticsintervals

How to combine intervals data into fewer intervals in R?


I am trying to collapse a series of intervals into fewer, equally meaningful intervals.

Consider for example this list of intervals

Intervals = list(
  c(23,34),
  c(45,48),
  c(31,35),
  c(7,16),
  c(5,9),
  c(56,57),
  c(55,58)
)

Because the intervals overlap, the same intervals can be described with few vectors. Plotting these intervals make obvious that a list of 4 vectors would be enough

plot(1,1,type="n",xlim=range(unlist(Intervals)),ylim=c(0.9,1.1))
segments(
    x0=sapply(Intervals,"[",1),
    x1=sapply(Intervals,"[",2),
    y0=rep(1,length(Intervals)),
    y1=rep(1,length(Intervals)),
    lwd=10
    )

enter image description here

How can I reduce my Intervals list to carry the same info than the one displayed on the plot? (performance matter)

The desired outputs for the above example is

Intervals = list(
  c(5,16)
  c(23,35),
  c(45,48),
  c(55,58)
)

Solution

  • What you need is the reduce function in the IRanges package.

    In.df <- do.call(rbind, Intervals)
    library(IRanges)
    
    In.ir <- IRanges(In.df[, 1], In.df[,2])
    
    out.ir <- reduce(In.ir)
    out.ir
    # IRanges of length 4
    #     start end width
    # [1]     5  16    12
    # [2]    23  35    13
    # [3]    45  48     4
    # [4]    55  58     4