I have a data set of 3 columns: type (a and b), start time and end time. I need to see the first exact number that each type a row overlaps with all type b's. Here is an example of the data set. start and end time obviously represents seconds, and I want to know at what point type a's occurred during type b's. If anyone has any help on this that would be greatly appreciated!
structure(list(type = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"), starttime = c(470, 858, 1330, 942, 1084, 1320, 1374, 1817, 1394, 1469, 1561, 1796, 1880, 1882, 508, 852, 1203, 1244, 1579, 1865, 2287, 3163, 3784, 4266, 4565, 4936, 5448), endtime = c(485, 873, 1345, 957, 1099, 1335, 1389, 1832, 1409, 1484, 1576, 1811, 1895, 1897, 536, 919, 1216, 1285, 1598, 1892, 2355, 3229, 3817, 4303, 4626, 4976, 5497)), row.names = c(NA, -27L), class = c("tbl_df", "tbl", "data.frame"))
There are a number of approaches that might help, depending on your exact needs. There are packages available that might offer methods worth trying out. These allow you to join or merge two different data sources with ranges.
One package is fuzzyjoin
which uses "IRanges" from BiocManager. For example, using interval_join
:
# install.packages("BiocManager")
# BiocManager::install("IRanges")
library(fuzzyjoin)
interval_join(df[df$type == "a",],
df[df$type == "b",],
by = c("starttime", "endtime"))
Output
type.x starttime.x endtime.x type.y starttime.y endtime.y
<chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 a 858 873 b 852 919
2 a 1880 1895 b 1865 1892
3 a 1882 1897 b 1865 1892