I have multiple lists of genes, for example:
listA <- c("geneA", "geneB", "geneC")
listB <- c("geneA", "geneB", "geneD", "geneE")
listC <- c("geneB", "geneF")
...
I'd like to get a table to show the # of overlapping elements between the lists, like:
listA listB listC ...
listA 3 2 1
listB 2 4 1
listC 1 1 2
...
I know how to get the # of overlaps between each pair, like length(intersect(listA, listB))
. But what are the easier ways to generate the overlap table?
Here is a way in base R
crossprod(table(stack(mget(ls(pattern = "^list")))))
# ind
#ind listA listB listC
# listA 3 2 1
# listB 2 4 1
# listC 1 1 2
mget(ls(pattern = "^list"))
will give you a list of elements from your global environment whose names begin with "list".
stack
will turn this list into the following data frame
stack(mget(ls(pattern = "^list")))
# values ind
#1 geneA listA
#2 geneB listA
#3 geneC listA
#4 geneA listB
#5 geneB listB
#6 geneD listB
#7 geneE listB
#8 geneB listC
#9 geneF listC
Calling table
returns.
out <- table(stack(mget(ls(pattern = "^list"))))
out
# ind
#values listA listB listC
# geneA 1 1 0
# geneB 1 1 1
# geneC 1 0 0
# geneD 0 1 0
# geneE 0 1 0
# geneF 0 0 1
crossprod
then calculates
t(out) %*% out
which returns
# ind
#ind listA listB listC
# listA 3 2 1
# listB 2 4 1
# listC 1 1 2