I have a data like this in R
x <- c(1,2,2,3,4,4,7,8)
y <- c(300,200,200,150,100,100,30,20)
df <- data.frame(x, y)
The cumulative with the dataset is
cum_df <- data.frame(x, y, Y)
> cum_df
x y Y
1 1 300 300
2 2 200 500
3 2 200 700
4 3 150 850
5 4 100 950
6 4 100 1050
7 7 30 1080
8 8 20 1100
The cumulative of "y" using cumsum(y) is:
Y <- cumsum(y)
> Y
[1] 300 500 700 850 950 1050 1080 1100
Instead, I want the cumulative of "y" to be like this
> Y
[1] 300 500 500 650 750 750 780 800
In essence, it does not compute repeated values of y. How do I go about this in R? I have tried different functions but it seem not to work. I want the answer to look like this
> ans
x y Y
1 1 300 300
2 2 200 500
3 2 200 500
4 3 150 650
5 4 100 750
6 4 100 750
7 7 30 780
8 8 20 800
We can get the distinct
rows, do the cumsum
and then do a join
library(dplyr)
df %>%
distinct() %>%
mutate(Y = cumsum(y)) %>%
right_join(df)
# x y Y
#1 1 300 300
#2 2 200 500
#3 2 200 500
#4 3 150 650
#5 4 100 750
#6 4 100 750
#7 7 30 780
#8 8 20 800
Or without any join by replacing the duplicated values in 'y' with 0, and then do the cumsum
df %>%
mutate(Y = cumsum(y * !duplicated(y)))
# x y Y
#1 1 300 300
#2 2 200 500
#3 2 200 500
#4 3 150 650
#5 4 100 750
#6 4 100 750
#7 7 30 780
#8 8 20 800
Or in base R
df$Y <- with(df, cumsum(y * !duplicated(y)))