I have a dataframe with ranges which looks like this:
df <- data.frame(label = c("A", "B", "C"),
start = c(2, 11, 22),
stop = c(37, 45, 29))
Now I would like to obtain a matrix in which I can see how much overlap (percentage) there is between A:B, B:C, A:C etc. i.e., how much of range A occurs in range B etc. Output should look like this:
A B C
A 100 76.5 100
B 74.3 100 100
C 20 20.6 100
I have tried to obtain such a matrix with IRanges or GRanges, but this seems not possible. Hope someone can help me with this!
out <- 100 * with(df, t((outer(stop, stop, pmin) - outer(start, start, pmax)) / (stop - start)))
dimnames(out) <- list(df$label, df$label)
out
# A B C
# A 100.00000 76.47059 100
# B 74.28571 100.00000 100
# C 20.00000 20.58824 100
library(dplyr)
library(tidyr)
expand_grid(Var1 = df$label, Var2 = df$label) %>%
left_join(df, by = c("Var1" = "label")) %>%
left_join(df, by = c("Var2" = "label")) %>%
mutate(
start = pmax(start.y, start.x),
stop = pmin(stop.x, stop.y),
overlap = 100 * (stop - start) / (stop.y - start.y)
) %>%
pivot_wider(Var1, names_from = Var2, values_from = overlap)
# # A tibble: 3 x 4
# Var1 A B C
# <chr> <dbl> <dbl> <dbl>
# 1 A 100 76.5 100
# 2 B 74.3 100 100
# 3 C 20 20.6 100