I was searching on internet for similar solution, but I was not able to find the specific one for my case. Let's say a have the following data frame:
a = c(1, 1, 1, 2, 2)
b = c(2, 1, 1, 1, 2)
c = c(2, 2, 1, 1, 1)
d = c(1, 2, 2, 1, 1)
df <- data.frame(a = a, b = b, c = c, d = d)
and df
looks like this:
a b c d
1 1 2 2 1
2 1 1 2 2
3 1 1 1 2
4 2 1 1 1
5 2 2 1 1
Note: In this example I use [1,2]
pair of values, but it could be a set of different values: [-1,1]
or even more than two possible values: [-1,1,2]
.
Now I would like to have a matrix where each [i,j]
element will represent the number of rows with the value 1
for column i
and j
. For this particular case we have (showing the upper diagonal, because its symmetric):
a b c d
a 3 2 1 1
b 3 2 1
c 3 2
d 3
The diagonal should count the number of rows with 1
value at a given column. On this case all columns have the sames number of value 1
. The format should be similar to cor()
function (Correlation Matrix).
I was trying to use table()
(and also crosstab
from descr
package) but it shows the information by pairs of columns.
It can be done by computing manually the occurrence of 1
of each pair of columns (i.e.: nrow(df[df$a==1 & df$b==1,])=2
) and then putting into a matrix, but I was wondering if there is a built-in function that simplify the process.
We can use crossprod
on a matrix
for computing the occurrences of the value 1
of the question´s example:
m1 <- as.matrix(df == 1) # see Note[1]
out <- crossprod(m1)
Note[1] Pointed by @imo (see comments below) for addressing the general case (a matrix with values: [x,y]
). For a matrix with [0,1]
values df==1
can be replaced by df
. For counting the 2
values from question's example, then use: df == 2
.
If the lower diagonal should be 0
or NA
out[lower.tri(out)] <- NA
out
# a b c d
#a 3 2 1 1
#b NA 3 2 1
#c NA NA 3 2
#d NA NA NA 3