I have a large matrix filled with True/False values under each column. Is there a way I can summarize the matrix so that every row is unique and I have a new column with the sum of how often that row appeared.
Example:
A B C D E
[1] T F F T F
[2] T T T F F
[3] T F F T T
[4] T T T F F
[5] T F F T F
Would become:
A B C D E total
[1] T F F T F 2
[2] T T T F F 2
[3] T F F T F 1
I cbind this matrix with a new column rev so I now have a data.frame that looks like
A B C D E rev
[1] T F F T F 2
[2] T T T F F 3
[3] T F F T T 5
[4] T T T F F 2
[5] T F F T F 1
And would like a data.frame that also sums the rev column as follows:
A B C D E rev total
[1] T F F T F 3 2
[2] T T T F F 5 2
[3] T F F T T 5 1
An approach with dplyr
:
use as.data.frame
(or here as_tibble
) first if you start from a matrix
. In the end you need to have a data.frame
anyway as you'll have both numeric
and logical
in your table.
mat <- matrix(
c(T, F, F, T, F, T, T, T, F, F, T, F, F, T, T, T, T, T, F, F, T, F, F, T, F),
ncol = 5,
byrow = TRUE,
dimnames = list(NULL, LETTERS[1:5])
)
library(dplyr)
mat %>%
as_tibble %>% # convert matrix to tibble, to be able to group
group_by_all %>% # group by every column so we can count by group of equal values
tally %>% # tally will add a count column and keep distinct grouped values
ungroup # ungroup the table to be clean
#> # A tibble: 3 x 6
#> A B C D E n
#> <lgl> <lgl> <lgl> <lgl> <lgl> <int>
#> 1 TRUE FALSE FALSE TRUE FALSE 2
#> 2 TRUE FALSE FALSE TRUE TRUE 1
#> 3 TRUE TRUE TRUE FALSE FALSE 2
Created on 2018-05-29 by the reprex package (v0.2.0).
And a base solution:
df <- as.data.frame(mat)
df$n <- 1
aggregate(n~.,df,sum)
# A B C D E n
# 1 TRUE TRUE TRUE FALSE FALSE 2
# 2 TRUE FALSE FALSE TRUE FALSE 2
# 3 TRUE FALSE FALSE TRUE TRUE 1
Or as a one liner: aggregate(n~.,data.frame(mat,n=1),sum)