Im trying to implement a SUMIFS logic for the following table that I have implemented in EXCEL (Column A, B, C respectively):
ID x y
123 3 50
123 2 -10
123 4 -20
124 3 50
124 2 -10
124 4 -20
=IF(C2>0;(SUMIFS($C$2:$C$7;$A$2:$A$7;A2;$B$2:$B$7;">="&B2));C2)
Copy the formula down to populate the vector z and it works.
For each row, if "y" > 0 the formula adds values of "y" if the ID is the same and the value of "x" is greater than the value of "x" in the current row.
In R, I have tried to write a ddply function but Im struggling to bed in a for loop within. So here is where Im at:
ddply(test,.(ID,x), mutate, z = function(y))
I have similarly tried with "summarise" but the problem of conditions remain. I would appreciate any help in defining function (y) so that I can get an equivalent excel result. Many Thanks.
This is a bit brute force and probably not the best approach, but it replicates the Excel output:
# sample data:
df <- read.table(text = 'ID x y
123 3 50
123 2 -10
123 4 -20
124 3 50
124 2 -10
124 4 -20', header = TRUE)
# create a new column called 'sum_y' using the defined rules:
for(i in 1:nrow(df)){
if (df$y[i] > 0) df$sum_y[i] = sum(df$y[df$x >= df$x[i] & df$ID == df$ID[i]])
else df$sum_y[i] = df$y[i]
}
# view the output:
df
ID x y sum_y
1 123 3 50 30
2 123 2 -10 -10
3 123 4 -20 -20
4 124 3 50 30
5 124 2 -10 -10
6 124 4 -20 -20