Search code examples
rdataframepivot-table

Cross table of several columns in R


I have this dataset called x for example:

client_id a b c
1 1 0 0
2 0 1 1
3 0 0 1
4 1 1 1

I then want to create another table, that calculates the number of times each column from a to c had 1, in combination of other columns having 1 too.

So the result would be this table:

a b c
a 1 1 1
b 1 0 2
c 1 2 1

So the first line of the second table would read like: How many times there was only column a having a 1? it's once only. How many times there was 1 in both a and b? it's only once. How many times there was 1 in both a and c? it happened only once.

How can I do this easily in R? note that I only want to focus on when the column had 1.


Solution

  • You can use crossprod to build the off-diagonal entries. For later referencing, this is sometimes called a co-occurrence matrix.

    To create the diagonal, it's a bit more tricky: find rows that sums up to 1 and apply a column-wise sum.

    mat <- as.matrix(df[-1])
    out <- crossprod(mat)  # Same as: t(mat) %*% mat
    diag(out) <- colSums(mat[rowSums(mat) == 1, ])
    out
    
    #   a b c
    # a 1 1 1
    # b 1 0 2
    # c 1 2 1