Search code examples
rintervalsspreadgenomicrangesiranges

A fast way tp spread a linear range


I have a data.frame where each row is a linear interval - specifically these intervals are start and end coordinates on chromosomes (chr below):

df <- data.frame(chr = c("chr1","chr2","chr2","chr3"),
                 strand = c("+","+","-","-"),
                 start = c(34,23,67,51),
                 end = c(52,49,99,120),
                 stringsAsFactors = F)

A chromosome has tow strands hence the strand column.

I'd like to spread these intervals to a width of 1 thereby replacing the start and end columns with a position column. So far I'm using this:

spread.df <- do.call(rbind,lapply(1:nrow(df),function(i)
  data.frame(chr = df$chr[i], strand = df$strand[i], position = df$start[i]:df$end[i], strand = df$strand[i], stringsAsFactors = F)
))

But for the number of intervals I have and their sizes it's a bit slow. So my question is if there's a faster alternative.


Solution

  • map2 would be fast

    library(dplyr)
    library(purrr)
    library(tidyr)
    df %>% 
      transmute(chr, strand, position = map2(start, end, `:`)) %>% 
       unnest(position)
    

    Or use data.table

    library(data.table)
    setDT(df)[, .(position = start:end), .(chr, strand)]