I have some data like the following data frame:
df = data.frame(name=c(rep("Group1",5),rep("Group2",5)),
mon=c(1,2,3,4,5,1,2,3,4,5),
amt=c(10,NA,NA,NA,NA,15,NA,NA,NA,NA),
inc=c(.1,.1,.2,.2,.3,.1,.1,.2,.2,.3))
df
name mon amt inc
1 Group1 1 10 0.1
2 Group1 2 NA 0.1
3 Group1 3 NA 0.2
4 Group1 4 NA 0.2
5 Group1 5 NA 0.3
6 Group2 1 15 0.1
7 Group2 2 NA 0.1
8 Group2 3 NA 0.2
9 Group2 4 NA 0.2
10 Group2 5 NA 0.3
There are multiple groups and the first amount column (amt) is populated but the rest in the group are null. I have another column (inc) that is the percent I want to increase the amount field, if it is currently null. When I get to the next group, I want to restart the process.
For example, Group1 mon 2 would be 10*(1+0.1)=11, mon 3 would be a 20% increase on the previously calculated mon 2 value: 11*(1+0.2)=13.2 and so on.
I have been trying to use dplyr and the lag function but I either can't get it work or it's not built for that and I don't know what kind of function would work. I was able to use a loop to iterate over the data frame and get the results I wanted but I was hoping to do it a bit more elegantly to make me a better coder with R.
Below is the loop I built that works currently:
for (i in 1:nrow(df)){
df$amt[i] = ifelse(df$mon[i]==1,
df$amt[i],
df$amt[i-1]*(1+df$inc[i]))
}
df
name mon amt inc
1 Group1 1 10.000 0.1
2 Group1 2 11.000 0.1
3 Group1 3 13.200 0.2
4 Group1 4 15.840 0.2
5 Group1 5 20.592 0.3
6 Group2 1 15.000 0.1
7 Group2 2 16.500 0.1
8 Group2 3 19.800 0.2
9 Group2 4 23.760 0.2
10 Group2 5 30.888 0.3
The issue you're probably running into with dplyr::lag()
in that it lag(amt)
will give you the lag()
of the column as it is, with the NA
values, rather than a vector which updates as you populate each value iteratively.
You can achieve that behaviour with the Reduce()
function (see below). However, in this case you can just multiply the first amt
value by the cumulative product by group using cumprod()
.
df |>
mutate(
amt = amt[1] * c(1, cumprod(1 + tail(inc, -1))),
.by = name
)
# name mon amt inc
# 1 Group1 1 10.000 0.1
# 2 Group1 2 11.000 0.1
# 3 Group1 3 13.200 0.2
# 4 Group1 4 15.840 0.2
# 5 Group1 5 20.592 0.3
# 6 Group2 1 15.000 0.1
# 7 Group2 2 16.500 0.1
# 8 Group2 3 19.800 0.2
# 9 Group2 4 23.760 0.2
# 10 Group2 5 30.888 0.3
Reduce()
Here is a similar approach to achieve the same result using Reduce()
. This will be slower but perhaps gives a better sense of what is happening step-by-step.
df |>
split(~name) |>
lapply(\(dat)
transform(dat, amt = Reduce(
\(x, y) x * (1 + y), tail(dat$inc, -1),
init = dat$amt[1], accumulate = TRUE
))) |>
do.call(rbind, args = _)