Say I have a matrix like the following:
set.seed(123)
newmat=matrix(rnorm(25),ncol=5)
colnames(newmat)=paste0('mark',1:5)
rownames(newmat)=paste0('id',1:5)
newmat[,2]=NA
newmat[c(2,5),4]=NA
newmat[c(1,4,5),5]=NA
newmat[1,1]=NA
newmat[5,3]=NA
> newmat
mark1 mark2 mark3 mark4 mark5
id1 NA NA 1.2240818 1.7869131 NA
id2 -0.23017749 NA 0.3598138 NA -0.2179749
id3 1.55870831 NA 0.4007715 -1.9666172 -1.0260044
id4 0.07050839 NA 0.1106827 0.7013559 NA
id5 0.12928774 NA NA NA NA
The only thing I want to check here in an easy way, is that there are at least 2 columns with 3 values, but also, that those columns have the values in the same rows...
In the case above, I have the pair of columns 1 and 3 fulfilling this, as well as the pair of columns 3 and 4... the pair of columns 1 and 4 wouldn't fulfill this. For a total of 3 columns.
How could I do this check in R? I know I'd do something involving colSums(!is.na(newmat))
but not sure about the rest... Thanks!
Here is a matrix (obtained by using crossprod
+ is.na
) that shows which pairs fullfil your objective
> `diag<-`(crossprod(!is.na(newmat)), 0) >= 3
mark1 mark2 mark3 mark4 mark5
mark1 FALSE FALSE TRUE FALSE FALSE
mark2 FALSE FALSE FALSE FALSE FALSE
mark3 TRUE FALSE FALSE TRUE FALSE
mark4 FALSE FALSE TRUE FALSE FALSE
mark5 FALSE FALSE FALSE FALSE FALSE
as we can see, pairs (mark1, mark3)
and (mark3, mark4)
are the desired output.