I have a data frame which shows membership in three color classes. Numbers refer to unique IDs. One ID may be a part of one group or multiple groups.
dat <- data.frame(BLUE = c(1, 2, 3, 4, 6, NA),
RED = c(2, 3, 6, 7, 9, 13),
GREEN = c(4, 6, 8, 9, 10, 11))
or for visual reference:
BLUE RED GREEN
1 2 4
2 3 6
3 6 8
4 7 9
6 9 10
NA 13 11
I need to identify and tally individual and cross group membership (i.e. how many IDs were only in red, how many were in both red and blue, etc.) My desired output is below. Please note that the IDs column is simply for reference, that column would not be in the expected output.
COLOR TOTAL IDs (reference only, not needed in final output)
RED 2 (7, 13)
BLUE 1 (1)
GREEN 3 (8, 10, 11)
RED, BLUE 3 (2, 3, 6)
RED, GREEN 2 (6, 9)
BLUE, GREEN 2 (4, 6)
RED, BLUE, GREEN 1 (6)
Does anyone know an efficient way to do this in R? Thanks!
library(dplyr)
library(tidyr)
cbind(dat, row = 1:6) %>%
gather(COLOR, IDs, -row) %>%
group_by(IDs) %>%
nest(COLOR, .key="COLOR") %>%
mutate(COLOR = sapply(COLOR, as.character)) %>%
drop_na %>%
group_by(COLOR) %>%
add_count(name="TOTAL") %>%
group_by(COLOR, TOTAL) %>%
nest(IDs, .key = "IDs") %>%
as.data.frame
#> COLOR TOTAL IDs
#> 1 BLUE 1 1
#> 2 c("BLUE", "RED") 2 2, 3
#> 3 c("BLUE", "GREEN") 1 4
#> 4 c("BLUE", "RED", "GREEN") 1 6
#> 5 RED 2 7, 13
#> 6 c("RED", "GREEN") 1 9
#> 7 GREEN 3 8, 10, 11
There's a more conventional method to deal with NA
in venn
package:
library(purrr)
library(magrittr)
library(venn)
as.list(dat) %>%
map(discard, is.na) %>%
compact() %>%
venn() %>%
print
#> BLUE RED GREEN counts
#> 0 0 0 0
#> GREEN 0 0 1 3
#> RED 0 1 0 2
#> RED:GREEN 0 1 1 1
#> BLUE 1 0 0 1
#> BLUE:GREEN 1 0 1 1
#> BLUE:RED 1 1 0 2
#> BLUE:RED:GREEN 1 1 1 1
There are many other packages for venn
diagram in R
according to this answer.
For instance, VennDiagram::venn.diagram
package has an na
variable which gets stop
, remove
, and none
. So, here we would use remove
; however, it will only give us the diagram and not the table. You can explore other possibilities in other packages.