I am looking for an algorithm that classifies (with letters) a set of intervals according to their overlap.
The output should associate each overlapping interval with the same letter, thus constituting a unique group.
Intervals that overlap with multiple groups will be classified with several letters corresponding to each unique group.
Non-overlapping intervals also constitute a unique group.
Graphically the problem is as follows:
I am working in R and my intervals are these:
structure(list(Interval = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Start = c(5.3,
6.5, 7.6, 7.8, 8, 8.3, 8.5, 8.7, 8.8, 9.9), End = c(7.5, 8.7,
9.8, 10, 10.2, 10.5, 10.7, 10.9, 11, 12.1)), row.names = c(NA,
-10L), spec = structure(list(cols = list(Interval = structure(list(), class = c("collector_double",
"collector")), Start = structure(list(), class = c("collector_double",
"collector")), End = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = "\t"), class = "col_spec"), class = c("spec_tbl_df","tbl_df", "tbl", "data.frame"))
I think the ivs
package may have the solution but I don't know the procedure.
I think this function does what you need.
label_overlaps <- function(data, Start = "Start", End = "End",
label = "label", labs = letters) {
data <- data[order(data[[Start]]), ]
data[[label]] <- ""
for(i in labs) {
n <- which(data[[label]] == "")
if(length(n) == 0) break
n <- n[1]
m <- which(data[[Start]] < data[[End]][n] & data[[End]] > data[[Start]][n])
data[[label]][c(n, m)] <- paste0(data[[label]][c(n, m)], i)
}
if(any(!nzchar(data[[label]]))) warning("All labels exhausted")
return(data)
}
Use is very simple:
label_overlaps(df)
#> # A tibble: 10 x 4
#> Interval Start End label
#> <dbl> <dbl> <dbl> <chr>
#> 1 1 5.3 7.5 a
#> 2 2 6.5 8.7 ab
#> 3 3 7.6 9.8 b
#> 4 4 7.8 10 bc
#> 5 5 8 10.2 bc
#> 6 6 8.3 10.5 bc
#> 7 7 8.5 10.7 bc
#> 8 8 8.7 10.9 bc
#> 9 9 8.8 11 bc
#> 10 10 9.9 12.1 c
If we use data that matches your sample image, we get:
library(geomtextpath)
df <- data.frame(Interval = c(3, 2, 1, 3, 2, 2),
Start = c(1, 3, 7, 14, 15, 18.3),
End = c(4, 13, 10, 17, 17.7, 22))
ggplot(label_overlaps(df), aes(Start, Interval, xend = End, yend = Interval)) +
geom_segment(linewidth = 9, lineend = "round") +
geom_textsegment(aes(label = label, group = seq_along(label)),
textcolour = "black", gap = FALSE,
linewidth = 8, lineend = "round", color = "#bffec0") +
coord_cartesian(clip = "off") +
theme_void(base_size = 16) +
theme(aspect.ratio = 1/8)
And with your actual data, we get:
ggplot(label_overlaps(df), aes(Start, Interval)) +
geom_textsegment(aes(label = label, xend = End, yend = Interval,
color = label), textcolour = "black", gap = FALSE,
vjust = -1, linewidth = 6, lineend = "round") +
scale_color_manual(values = c("red", "orangered", "orange",
"green4", "dodgerblue"), guide = "none") +
theme_minimal(base_size = 16)