Search code examples
rgroup-byfilteringcalculated-columns

Calculate sum column filtering identical values on X number of columns


I'm trying to calculate a sum column where the sum is calculated for each row from rows that have identical values in X number of columns.

In the example I want to sum values from each row (from value column to result column) if at least 2 values in columns S1, S2 or S3 are identical to values in the row being calculated. For row 1 values would be summed from rows 1, 2, 3 and 5.

S1 <- c(1,1,1,0,1,0)
S2 <- c(1,1,1,0,0,0)
S3 <- c(1,0,0,0,1,1)
value <- c(9,5,3,2,4,1)
result <- c(21,17,17,3,14,7)
df <- data.frame(S1,S2,S3,value,result)
df

  S1 S2 S3 value result
1  1  1  1     9     21
2  1  1  0     5     17
3  1  1  0     3     17
4  0  0  0     2      3
5  1  0  1     4     14
6  0  0  1     1      7

I couldn't get group_by to work if all conditions don't match. I'm looking for a solution that can be used for larger amount of rows and columns.

This question is related to my previous question where sum was calculated if all columns matched.

Calculate sum column filtering identical values on multiple columns

Edit: typos.


Solution

  • Comparing each S1:S3 row sequentially with all S1:S3 rows and applying the conditional check.

    cbind(df, result = sapply(1:nrow(df), function(x) 
      sum(df$value[colSums(unlist(df[x,1:3]) == t(as.matrix(df[,1:3]))) >= 2])))
      S1 S2 S3 value result
    1  1  1  1     9     21
    2  1  1  0     5     17
    3  1  1  0     3     17
    4  0  0  0     2      3
    5  1  0  1     4     14
    6  0  0  1     1      7
    

    Data

    df <- structure(list(S1 = c(1, 1, 1, 0, 1, 0), S2 = c(1, 1, 1, 0, 0, 
    0), S3 = c(1, 0, 0, 0, 1, 1), value = c(9, 5, 3, 2, 4, 1)), class = "data.frame", 
    row.names = c(NA, -6L))