Search code examples
roverlap

How to make a overlap table from multiple lists/vectors in R?


I have multiple lists of genes, for example:

listA <- c("geneA", "geneB", "geneC")

listB <- c("geneA", "geneB", "geneD", "geneE")

listC <- c("geneB", "geneF")

...

I'd like to get a table to show the # of overlapping elements between the lists, like:

       listA   listB  listC  ...
listA   3       2      1
listB   2       4      1
listC   1       1      2
...

I know how to get the # of overlaps between each pair, like length(intersect(listA, listB)). But what are the easier ways to generate the overlap table?


Solution

  • Here is a way in base R

    crossprod(table(stack(mget(ls(pattern = "^list")))))
    #       ind
    #ind     listA listB listC
    #  listA     3     2     1
    #  listB     2     4     1
    #  listC     1     1     2
    

    mget(ls(pattern = "^list")) will give you a list of elements from your global environment whose names begin with "list".

    stack will turn this list into the following data frame

    stack(mget(ls(pattern = "^list")))
    #  values   ind
    #1  geneA listA
    #2  geneB listA
    #3  geneC listA
    #4  geneA listB
    #5  geneB listB
    #6  geneD listB
    #7  geneE listB
    #8  geneB listC
    #9  geneF listC
    

    Calling table returns.

    out <- table(stack(mget(ls(pattern = "^list"))))
    out
    #       ind
    #values  listA listB listC
    #  geneA     1     1     0
    #  geneB     1     1     1
    #  geneC     1     0     0
    #  geneD     0     1     0
    #  geneE     0     1     0
    #  geneF     0     0     1
    

    crossprod then calculates

    t(out) %*% out
    

    which returns

    #       ind
    #ind     listA listB listC
    #  listA     3     2     1
    #  listB     2     4     1
    #  listC     1     1     2