I have a list that looks like this one:
$`264`
[1] "CHAMP1" "MAP1S" "PRRC1" "TUT1" "CDK12"
$`265`
[1] "TUT1" "PRRC1" "CHAMP1" "MAP1S"
$`266`
[1] "REPS1" "CHAMP1" "PRRC1" "TUT1" "MAP1S"
$`267`
[1] "G3BP1" "TUT1" "PRRC1" "CHAMP1" "MAP1S"
$`268`
[1] "TUT1" "CHAMP1" "PRRC1" "MAP1S"
$`269`
[1] "DDB1" "CHAMP1" "TUT1" "PRRC1" "MAP1S"
Is there any package
or function to calculate the similarity among the different list components?
Many thanks
I'm not aware of any packages, but this implements your own metric (from your comment):
siml <- function(x,y) {
length(intersect(lst[[x]],lst[[y]]))/length(union(lst[[x]],lst[[y]]))
}
z <- expand.grid(x=1:length(lst),y=1:length(lst))
result <- mapply(siml,z$x,z$y)
dim(result) <- c(length(lst),length(lst))
result
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1.000 0.8 0.667 0.667 0.8 0.667
# [2,] 0.800 1.0 0.800 0.800 1.0 0.800
# [3,] 0.667 0.8 1.000 0.667 0.8 0.667
# [4,] 0.667 0.8 0.667 1.000 0.8 0.667
# [5,] 0.800 1.0 0.800 0.800 1.0 0.800
# [6,] 0.667 0.8 0.667 0.667 0.8 1.000
This is a (slightly) more efficient way to do the same thing:
result <- sapply(lst,function(x)
sapply(lst,function(y,x)length(intersect(x,y))/length(union(x,y)),x))