I have a data.frame
where each row is a linear interval - specifically these intervals are start and end coordinates on chromosomes (chr
below):
df <- data.frame(chr = c("chr1","chr2","chr2","chr3"),
strand = c("+","+","-","-"),
start = c(34,23,67,51),
end = c(52,49,99,120),
stringsAsFactors = F)
A chromosome has tow strands hence the strand
column.
I'd like to spread
these intervals to a width of 1 thereby replacing the start
and end
columns with a position
column. So far I'm using this:
spread.df <- do.call(rbind,lapply(1:nrow(df),function(i)
data.frame(chr = df$chr[i], strand = df$strand[i], position = df$start[i]:df$end[i], strand = df$strand[i], stringsAsFactors = F)
))
But for the number of intervals I have and their sizes it's a bit slow. So my question is if there's a faster alternative.
map2
would be fast
library(dplyr)
library(purrr)
library(tidyr)
df %>%
transmute(chr, strand, position = map2(start, end, `:`)) %>%
unnest(position)
Or use data.table
library(data.table)
setDT(df)[, .(position = start:end), .(chr, strand)]