Search code examples
rmatrixsummarize

Summarise a logical Matrix


I have a large matrix filled with True/False values under each column. Is there a way I can summarize the matrix so that every row is unique and I have a new column with the sum of how often that row appeared.

Example:

    A B C D E
[1] T F F T F
[2] T T T F F
[3] T F F T T
[4] T T T F F
[5] T F F T F

Would become:

    A B C D E total
[1] T F F T F  2
[2] T T T F F  2
[3] T F F T F  1

EDIT

I cbind this matrix with a new column rev so I now have a data.frame that looks like

    A B C D E rev
[1] T F F T F  2
[2] T T T F F  3
[3] T F F T T  5
[4] T T T F F  2
[5] T F F T F  1

And would like a data.frame that also sums the rev column as follows:

    A B C D E rev total
[1] T F F T F  3    2
[2] T T T F F  5    2 
[3] T F F T T  5    1

Solution

  • An approach with dplyr :

    use as.data.frame (or here as_tibble) first if you start from a matrix. In the end you need to have a data.frame anyway as you'll have both numeric and logical in your table.

    mat <- matrix(
     c(T, F, F, T, F, T, T, T, F, F, T, F, F, T, T, T, T, T, F, F, T, F, F, T, F),
     ncol = 5,
     byrow = TRUE,
     dimnames = list(NULL, LETTERS[1:5])
    )
    
    library(dplyr)
    mat %>%
      as_tibble %>%    # convert matrix to tibble, to be able to group
      group_by_all %>% # group by every column so we can count by group of equal values
      tally %>%        # tally will add a count column and keep distinct grouped values
      ungroup          # ungroup the table to be clean
    #> # A tibble: 3 x 6
    #>   A     B     C     D     E         n
    #>   <lgl> <lgl> <lgl> <lgl> <lgl> <int>
    #> 1 TRUE  FALSE FALSE TRUE  FALSE     2
    #> 2 TRUE  FALSE FALSE TRUE  TRUE      1
    #> 3 TRUE  TRUE  TRUE  FALSE FALSE     2
    

    Created on 2018-05-29 by the reprex package (v0.2.0).

    And a base solution:

    df <- as.data.frame(mat)
    df$n <- 1
    aggregate(n~.,df,sum)
    #      A     B     C     D     E n
    # 1 TRUE  TRUE  TRUE FALSE FALSE 2
    # 2 TRUE FALSE FALSE  TRUE FALSE 2
    # 3 TRUE FALSE FALSE  TRUE  TRUE 1
    

    Or as a one liner: aggregate(n~.,data.frame(mat,n=1),sum)