Search code examples
rsetvenn-diagram

How to get counts of intersections of six or more sets?


I am running an analysis of a number of sets and I have been using the package VennDiagram, which has been working just fine, but it only handles up to 5 sets, and now it turns out that I need to look at 6 or more sets.

Ideally, I'm looking for a something that can do this (below) with 6 or more sets, but it doesn't necessarily have to have a plot function as long as the counts can be retrieved:

Venn diagram of 5 sets generated by the package VennDiagram

Any ideas of what I can do to add one or more sets to these five and still get the counts?

Thanks!


Solution

  • Here's an attempt:

    list1 <- c("a","b","c","e")
    list2 <- c("a","b","c","e")
    list3 <- c("a","b")
    list4 <- c("a","b","g","h")
    list_names <- c("list1","list2","list3","list4")
    
    lapply(1:length(list_names),function(y){
    combinations <- combn(list_names,y)
    res<-as.list(apply(combinations,2,function(x){
        if(length(x)==1){
                p <- setdiff(get(x),unlist(sapply(setdiff(list_names,x),get)))
            }
    
        else if(length(x) < length(list_names)){
                p <- setdiff(Reduce(intersect,lapply(x,get)),Reduce(union,sapply(setdiff(list_names,x),get)))
            }
    
        else p <- Reduce(intersect,lapply(x,get))
    
        if(!identical(p,character(0))) p
        else NA
    }))
    
    if(y==length(list_names)) {
            res[[1]] <- unlist(res); 
            res<-res[1]
    }
    names(res) <- apply(combinations,2,paste,collapse="-")
    res
    })
    

    The first lapply is used to loop from 1 to the number of sets you have. Then I took all possible combinations of list names, taken y at a time. This essentially generates all of the different subareas in the Venn diagram.

    For each combination, the output is the difference between the intersection of the lists in the current combination to the union of the other lists that are not in the combination.

    The final result is a list of length the number of sets inputed. The first element of that list holds the unique elements in each list, the second element the unique elements in any combination of two lists etc.