Search code examples
rexcelapplyplyrsumifs

EXCEL SUMIFS IN R using ddply


Im trying to implement a SUMIFS logic for the following table that I have implemented in EXCEL (Column A, B, C respectively):

ID  x   y
123 3   50
123 2   -10
123 4   -20
124 3   50
124 2   -10
124 4   -20
  1. Create a new vector z using the formula:

=IF(C2>0;(SUMIFS($C$2:$C$7;$A$2:$A$7;A2;$B$2:$B$7;">="&B2));C2)

  1. Copy the formula down to populate the vector z and it works.

  2. For each row, if "y" > 0 the formula adds values of "y" if the ID is the same and the value of "x" is greater than the value of "x" in the current row.

In R, I have tried to write a ddply function but Im struggling to bed in a for loop within. So here is where Im at:

ddply(test,.(ID,x), mutate, z = function(y))

I have similarly tried with "summarise" but the problem of conditions remain. I would appreciate any help in defining function (y) so that I can get an equivalent excel result. Many Thanks.


Solution

  • This is a bit brute force and probably not the best approach, but it replicates the Excel output:

    # sample data:
    df <- read.table(text = 'ID  x   y
                     123 3   50
                     123 2   -10
                     123 4   -20
                     124 3   50
                     124 2   -10
                     124 4   -20', header = TRUE)
    
    # create a new column called 'sum_y' using the defined rules:
    for(i in 1:nrow(df)){
      if (df$y[i] > 0) df$sum_y[i] = sum(df$y[df$x >= df$x[i] & df$ID == df$ID[i]])
      else df$sum_y[i] = df$y[i] 
    }
    
    # view the output:
    df
       ID x   y sum_y
    1 123 3  50    30
    2 123 2 -10   -10
    3 123 4 -20   -20
    4 124 3  50    30
    5 124 2 -10   -10
    6 124 4 -20   -20