Search code examples
rdata-comparison

How to compare each value at the same position from multiple datasets in R


Suppose I have three datasets which have the same form.

set.seed(1)
df1<-as.data.frame(matrix(sample(c(1:4),15,replace=T),nrow=5))
set.seed(2)
df2<-as.data.frame(matrix(sample(c(1:4),15,replace=T),nrow=5))
set.seed(3)
df3<-as.data.frame(matrix(sample(c(1:4),15,replace=T),nrow=5))

df1              df2               df3
  V1 V2 V3         V1 V2 V3          V1 V2 V3
1  2  4  1       1  1  4  3        1  1  3  3
2  2  4  1       2  3  1  1        2  4  1  3
3  3  3  3       3  3  4  4        3  2  2  3
4  4  3  2       4  1  2  1        4  2  3  3
5  1  1  4       5  4  3  2        4  3  3  4

What I wanted to do is to assign a value of 1 to the case that at least one of the three values in the same position from the three datasets is greater than 3, otherwise 0. The output I expect would be

newdf
  V1 V2 V3  
1  0  1  0
2  1  1  0
3  0  1  1
4  1  0  0
5  1  0  1 

Merging the three datasets into one might be a solution. But, because my data is very huge, I doubt that way is a good idea. Any suggestion would be appreciated!


Solution

  • Here's a possible solution that will save you merging the data sets

    (((df1 > 3L) + (df2 > 3L) + (df3 > 3L)) > 0L) + 0L
    #      V1 V2 V3
    # [1,]  0  1  0
    # [2,]  1  1  0
    # [3,]  0  1  1
    # [4,]  1  0  0
    # [5,]  1  0  1
    

    Or similarly

    (Reduce(`+`, list(df1 > 3L, df2 > 3L, df3 > 3L)) > 0L) + 0L
    

    The idea here is to check each value in each data set if it greater than 3, then sum up the TRUE results, check if the sum is > 0 and convert to integers by adding 0. This works because + and > are generic functions which have a data.frame method which preserves the dimensions of the data set, see ?Ops and more specifically methods(Ops).