Search code examples
rmatrixfrequency

R - Create a Matrix of Variables with the frequency of same values


I face a Problem in R which I can't handle myself.

I have a data frame that looks like this with more variables und cases:

ID      Var1   Var2   Var3   Var4
1          1      0      1      1
2          0      0      0      0
3          1      1      1      1
4          1      1      0      1
5          1      0      1      0

I like to have — similar to a correlation matrix — a matrix that shows the frequency that a pair of variables have the same value — for example the value "1". The resulting matrix for the df above should then be like.

           Var1   Var2   Var3   Var4
Var1                2      3      3
Var2                       1      2
Var3                              2
Var4                              

Perhaps you can help. Thank you in advance.


Solution

  • First create a evaluation data matrix that tests for your value, here 1.

    e <- d[-1] == 1  ## value to test
    

    Then use outer to compare the columns crosswise with a FUNction that sums how often there are two TRUEs summing up to 2. From the result you apparently want to remove the lower.tri including the diagonal.

    FUN <- Vectorize(function(i, j) sum(e[,i] + e[,j] == 2))
    (res <- t(outer(1:ncol(e), 1:ncol(e), FUN)))
    res[lower.tri(res, diag=1)] <- NA
    res
    #      [,1] [,2] [,3] [,4]
    # [1,]   NA    2    3    3
    # [2,]   NA   NA    1    2
    # [3,]   NA   NA   NA    2
    # [4,]   NA   NA   NA   NA