Use the data below to make the cumsum_a
column look like the should
column.
Data to start with:
> demo
th seq group
1 20.1 1 10
2 24.1 2 10
3 26.1 3 10
4 1.1 1 20
5 2.1 2 20
6 4.1 3 20
The "should" column below is the goal.
demo<-data.frame(th=c(c(20.1,24.1,26.1),(c(1.1,2.1,4.1))),
seq=(c(1:3,1:3)),group=c(rep(10,3),rep(20,3)))
library(magrittr)
library(dplyr)
demo %>%
group_by(group) %>%
mutate(
cumsum_a= cumsum((group)^seq*
(((th)/cummax(th)))))%>%
ungroup()%>%
mutate(.,
cumsum_m=c( #As an example only, this manually does exactly what cumsum_a is doing (which is wrong)
10^1*20.1/20.1, #good
10^1*20.1/20.1 + 10^2*24.1/24.1, #different denominators, bad
10^1*20.1/20.1 + 10^2*24.1/24.1 + 10^3*26.1/26.1, #different denominators, bad
20^1*1.1/1.1, #good
20^1*1.1/1.1 + 20^2*2.1/2.1, #different denominators, bad
20^1*1.1/1.1 + 20^2*2.1/2.1 + 20^3*4.1/4.1 #different denominators, bad
),
should=c( #this is exactly the kind of calculation I want
10^1*20.1/20.1, #good
10^1*20.1/24.1 + 10^2*24.1/24.1, #good
10^1*20.1/26.1 + 10^2*24.1/26.1 + 10^3*26.1/26.1, #good
20^1*1.1/1.1, #good
20^1*1.1/2.1 + 20^2*2.1/2.1, #good
20^1*1.1/4.1 + 20^2*2.1/4.1 + 20^3*4.1/4.1 #good
)
)
Most simply put, denominators need to be the same for each row so 24.1 and 24.1 instead of 20.1 and 24.1 on the second row of cumsum_m
or the underlying calculations for cumsum_a
.
Here are the new columns, where should
is what cumsum_a
or cumsum_m
should be.
th seq group cumsum_a cumsum_m should
<dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 20.1 1 10 10 10 10
2 24.1 2 10 110 110 108.
3 26.1 3 10 1110 1110 1100.
4 1.1 1 20 20 20 20
5 2.1 2 20 420 420 410.
6 4.1 3 20 8420 8420 8210.
You can use the following solution:
purrr::accumulate
takes a two argument function, the first one which is represented by .x
or ..1
is the accumulated value of the previous iterations and .y
represents the current value of our vector (2:n())
. So our first accumulated value will be first element of group
value as I supplied it as .init
argument.x
by the ratio of the previous value of cmax
to the current value of cmax
I think the rest is pretty clear but if you have any more question about it just let me know.
library(dplyr)
library(purrr)
demo %>%
group_by(group) %>%
mutate(cmax = cummax(th),
should = accumulate(2:n(), .init = group[1],
~ (.x * cmax[.y - 1] / cmax[.y]) + (group[.y] ^ seq[.y]) * (th[.y] / cmax[.y])))
# A tibble: 6 x 5
# Groups: group [2]
th seq group cmax should
<dbl> <int> <dbl> <dbl> <dbl>
1 20.1 1 10 20.1 10
2 24.1 2 10 24.1 108.
3 26.1 3 10 26.1 1100.
4 1.1 1 20 1.1 20
5 2.1 2 20 2.1 410.
6 4.1 3 20 4.1 8210.