Suppose I have three datasets which have the same form.
set.seed(1)
df1<-as.data.frame(matrix(sample(c(1:4),15,replace=T),nrow=5))
set.seed(2)
df2<-as.data.frame(matrix(sample(c(1:4),15,replace=T),nrow=5))
set.seed(3)
df3<-as.data.frame(matrix(sample(c(1:4),15,replace=T),nrow=5))
df1 df2 df3
V1 V2 V3 V1 V2 V3 V1 V2 V3
1 2 4 1 1 1 4 3 1 1 3 3
2 2 4 1 2 3 1 1 2 4 1 3
3 3 3 3 3 3 4 4 3 2 2 3
4 4 3 2 4 1 2 1 4 2 3 3
5 1 1 4 5 4 3 2 4 3 3 4
What I wanted to do is to assign a value of 1 to the case that at least one of the three values in the same position from the three datasets is greater than 3, otherwise 0. The output I expect would be
newdf
V1 V2 V3
1 0 1 0
2 1 1 0
3 0 1 1
4 1 0 0
5 1 0 1
Merging the three datasets into one might be a solution. But, because my data is very huge, I doubt that way is a good idea. Any suggestion would be appreciated!
Here's a possible solution that will save you merging the data sets
(((df1 > 3L) + (df2 > 3L) + (df3 > 3L)) > 0L) + 0L
# V1 V2 V3
# [1,] 0 1 0
# [2,] 1 1 0
# [3,] 0 1 1
# [4,] 1 0 0
# [5,] 1 0 1
Or similarly
(Reduce(`+`, list(df1 > 3L, df2 > 3L, df3 > 3L)) > 0L) + 0L
The idea here is to check each value in each data set if it greater than 3
, then sum up the TRUE
results, check if the sum is > 0
and convert to integers by adding 0
. This works because +
and >
are generic functions which have a data.frame
method which preserves the dimensions of the data set, see ?Ops
and more specifically methods(Ops)
.