I have this dataset called x for example:
client_id | a | b | c |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 1 |
3 | 0 | 0 | 1 |
4 | 1 | 1 | 1 |
I then want to create another table, that calculates the number of times each column from a to c had 1, in combination of other columns having 1 too.
So the result would be this table:
a | b | c | |
---|---|---|---|
a | 1 | 1 | 1 |
b | 1 | 0 | 2 |
c | 1 | 2 | 1 |
So the first line of the second table would read like: How many times there was only column a having a 1? it's once only. How many times there was 1 in both a and b? it's only once. How many times there was 1 in both a and c? it happened only once.
How can I do this easily in R? note that I only want to focus on when the column had 1.
You can use crossprod
to build the off-diagonal entries. For later referencing, this is sometimes called a co-occurrence matrix.
To create the diagonal, it's a bit more tricky: find rows that sums up to 1 and apply a column-wise sum.
mat <- as.matrix(df[-1])
out <- crossprod(mat) # Same as: t(mat) %*% mat
diag(out) <- colSums(mat[rowSums(mat) == 1, ])
out
# a b c
# a 1 1 1
# b 1 0 2
# c 1 2 1