Search code examples
rdplyrpivot-table

Frequency cross tabulation with combinations of binary variables in R


I have a data frame which looks like this:

df <- data.frame(dummyA=c(0,0,1,0,1), dummyB=c(1,1,0,1,1), dummyC=c(0,0,1,1,1))

> df
  dummyA dummyB dummyC
1      0      1      0
2      0      1      0
3      1      0      1
4      0      1      1
5      1      1      1

Are there ways to cross-tabulate frequency of all conbinations as illustrated below? I need to tabulate it in matrix format.

enter image description here

There are two rows with dummyA=1 (row# 3 and 5), thus left-top cell is 2. There is one row (#5) with dymmyA=1 and dmmyB=1, thus left-middle cells is 1. Two rows (#3 and 5) has dymmyA=1 and dymmyC=1, thus left-bottom cell is 2.

There are a lot of posts which deal with cross tabulation in R but I have not found this type of tabulations yet. I prefer dplyr for data manipulation but any suggestions are highly appreciated.


Solution

  • df <- data.frame(dummyA=c(0,0,1,0,1), dummyB=c(1,1,0,1,1), dummyC=c(0,0,1,1,1))
    
    df
    #>   dummyA dummyB dummyC
    #> 1      0      1      0
    #> 2      0      1      0
    #> 3      1      0      1
    #> 4      0      1      1
    #> 5      1      1      1
    
    m <- as.matrix(df)
    
    crossprod(m, m)
    #>        dummyA dummyB dummyC
    #> dummyA      2      1      2
    #> dummyB      1      4      2
    #> dummyC      2      2      3
    

    Created on 2023-11-21 with reprex v2.0.2