Search code examples
rmatrixmissing-data

Check that at least 2 columns in a matrix have at least 3 values... But they have to be in the same rows (for pairwise test)


Say I have a matrix like the following:

set.seed(123)
newmat=matrix(rnorm(25),ncol=5)
colnames(newmat)=paste0('mark',1:5)
rownames(newmat)=paste0('id',1:5)
newmat[,2]=NA
newmat[c(2,5),4]=NA
newmat[c(1,4,5),5]=NA
newmat[1,1]=NA
newmat[5,3]=NA

> newmat
          mark1 mark2     mark3      mark4      mark5
id1          NA    NA 1.2240818  1.7869131         NA
id2 -0.23017749    NA 0.3598138         NA -0.2179749
id3  1.55870831    NA 0.4007715 -1.9666172 -1.0260044
id4  0.07050839    NA 0.1106827  0.7013559         NA
id5  0.12928774    NA        NA         NA         NA

The only thing I want to check here in an easy way, is that there are at least 2 columns with 3 values, but also, that those columns have the values in the same rows...

In the case above, I have the pair of columns 1 and 3 fulfilling this, as well as the pair of columns 3 and 4... the pair of columns 1 and 4 wouldn't fulfill this. For a total of 3 columns.

How could I do this check in R? I know I'd do something involving colSums(!is.na(newmat)) but not sure about the rest... Thanks!


Solution

  • Here is a matrix (obtained by using crossprod + is.na) that shows which pairs fullfil your objective

    > `diag<-`(crossprod(!is.na(newmat)), 0) >= 3
          mark1 mark2 mark3 mark4 mark5
    mark1 FALSE FALSE  TRUE FALSE FALSE
    mark2 FALSE FALSE FALSE FALSE FALSE
    mark3  TRUE FALSE FALSE  TRUE FALSE
    mark4 FALSE FALSE  TRUE FALSE FALSE
    mark5 FALSE FALSE FALSE FALSE FALSE
    

    as we can see, pairs (mark1, mark3) and (mark3, mark4) are the desired output.