I'd really appreciate some help with an issue I have with my R dataframe. Couldn't find a similar thread, so please share if it exists already!
I have the following data:
mydata <- data.frame(inflow=c(50,60,55,70,80),
outflow=c(70,80,70,65,65),
current=c(100,100,100,100,100))
I want to create a new column which does something like:
mutate(calc=pmax(lag(calc,default=current)+inflow-outflow,inflow))
which basically creates a new column called calc that chooses between the maximum of a) the previous row value of calc plus this row's inflow minus outflow or b) this row's inflow value. pmax is a function from a package called rmpfr which selects the maximum across given columns per row.
so my results will be: row1 = max(100+50-70, 50) which is 80, row2 = max(80+60-80,60) which is 60 and so on.
The main issue is that the lag function doesn't allow for taking previous row values for the same column you're creating, it has to be a column that already exists in the data. I thought of doing it in steps by creating the calc column first and then adding a second calculation step, but can't exactly work it out.
Lastly, I know that using a for loop might be a solution but was wondering if there is a different way? my data is grouped by an extra column and not sure the for loop will work well with grouped data rows?
Thanks for any help :)
# I don't define the current column, as this is handled with the .init argument of accumulate2
mydata <- data.frame(
inflow=c(50,60,55,70,80),
outflow=c(70,80,70,65,65)
)
# define your recursive function
flow_function <- function(current, inflow, outflow){
pmax(inflow, inflow - outflow + current)
}
mydata %>%
mutate(result = accumulate2(inflow, outflow, flow_function, .init = 100)[-1] %>% unlist)
# inflow outflow result
# 1 50 70 80
# 2 60 80 60
# 3 55 70 55
# 4 70 65 70
# 5 80 65 85
Detail
The purrr::accumulate
family of functions are designed to perform recursive calculations.
accumulate
can handle functions which take the previous value plus values from one other column, whilst accumulate2
allows for a second additional column. Your scenario falls into the later.
accumulate2
expects the following arguments:
.x
- the first column for the calculation..y
- the second column for the calculation..f
- the function to apply recursively: this should have three arguments, the first of which is the recursive argument..init
- (optional) the initial value to use as the first argument.So in your case the function to pass to .f
will be
# define your recursive function
flow_function <- function(current, inflow, outflow){
pmax(inflow, inflow - outflow + current)
}
We first test what this produces outside of a dplyr::mutate
# note I don't define the current column, as this is handled with the .init argument
mydata <- data.frame(
inflow=c(50,60,55,70,80),
outflow=c(70,80,70,65,65)
)
purrr::accumulate2(mydata$inflow, mydata$outflow, flow_function, .init = 100)
# returns
# [[1]]
# [1] 100
#
# [[2]]
# [1] 80
#
# [[3]]
# [1] 60
#
# [[4]]
# [1] 55
#
# [[5]]
# [1] 70
#
# [[6]]
# [1] 85
So there's two things to note about the returned value:
unlist
back to a vector.These two final steps are brought together in the full example at the top.