I have a large dataset with columns IDNum, Var1, Var2, Var3, Var4, Var5, Var6. The variables are boolean with value either 0 or 1. Each row could be one of 64 different possible permutations. I would like to count the number of rows corresponding to each permutation present. Is there an efficient way to write this in R?
aggregate
can do this. Here's a shorter example:
r <- function() rbinom(10, 1, .5)
d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
d
IDNum Var1 Var2
1 1 0 1
2 2 0 1
3 3 0 0
4 4 1 0
5 5 1 1
6 6 0 0
7 7 1 1
8 8 1 0
9 9 0 1
10 10 0 1
Now to count the number of each combination:
> aggregate(d$IDNum, d[-1], FUN=length)
Var1 Var2 x
1 0 0 2
2 1 0 2
3 0 1 4
4 1 1 2
The values in d$IDNum
aren't actually used here, but something must be passed to the length
function. The values in d$IDNum
for each combination are passed to length
to get the count.