Search code examples
rrangeoverlap

R obtain matrix with overlap in ranges


I have a dataframe with ranges which looks like this:

df <- data.frame(label = c("A", "B", "C"),
                 start = c(2, 11, 22),
                 stop = c(37, 45, 29))

Now I would like to obtain a matrix in which I can see how much overlap (percentage) there is between A:B, B:C, A:C etc. i.e., how much of range A occurs in range B etc. Output should look like this:

          A       B      C
 A        100     76.5   100
 B        74.3    100    100
 C        20      20.6   100

I have tried to obtain such a matrix with IRanges or GRanges, but this seems not possible. Hope someone can help me with this!


Solution

  • Base R

    out <- 100 * with(df, t((outer(stop, stop, pmin) - outer(start, start, pmax)) / (stop - start)))
    dimnames(out) <- list(df$label, df$label)
    out
    #           A         B   C
    # A 100.00000  76.47059 100
    # B  74.28571 100.00000 100
    # C  20.00000  20.58824 100
    

    tidyverse

    library(dplyr)
    library(tidyr)
    expand_grid(Var1 = df$label, Var2 = df$label) %>%
      left_join(df, by = c("Var1" = "label")) %>%
      left_join(df, by = c("Var2" = "label")) %>%
      mutate(
        start = pmax(start.y, start.x),
        stop  = pmin(stop.x, stop.y),
        overlap = 100 * (stop - start) / (stop.y - start.y)
      ) %>%
      pivot_wider(Var1, names_from = Var2, values_from = overlap)
    # # A tibble: 3 x 4
    #   Var1      A     B     C
    #   <chr> <dbl> <dbl> <dbl>
    # 1 A     100    76.5   100
    # 2 B      74.3 100     100
    # 3 C      20    20.6   100