Similar to this post Calculate cummean() and cumsd() while ignoring NA values and filling NAs but I would rather stick to tidyverse
My data:
structure(list(season = c("Winter", "Winter", "Winter", "Winter",
"Winter", "Winter", "Winter", "Winter", "Winter", "Spring", "Spring",
"Spring", "Spring", "Spring", "Spring", "Spring", "Spring", "Spring"
), tmean = c(NA, 2, 3, 4, NA, NA, NA, 8, NA, 7, 8, 9, NA, NA,
5, 3, 2, NA)), class = "data.frame", row.names = c(NA, -18L))
season tmean
Winter NA
Winter 2
Winter 3
Winter 4
Winter NA
Winter NA
Winter NA
Winter 8
Winter NA
Spring 7
Spring 8
Spring 9
Spring NA
Spring NA
Spring 5
Spring 3
Spring 2
Spring NA
What I want:
season tmean cumtmean
Winter NA NA
Winter 2 2
Winter 3 2.5
Winter 4 3
Winter NA 3
Winter NA 3
Winter NA 3
Winter 8 4.25
Winter NA 4.25
Spring 7 7
Spring 8 7.5
Spring 9 8
Spring NA 8
Spring NA 8
Spring 5 7.25
Spring 3 6.4
Spring 2 5.66
Spring NA 5.66
You can use replace
+ cummean
to compute the cumulative mean, and fill
to fill NAs downward:
library(dplyr)
library(tidyr)
df |>
group_by(season) |>
mutate(cumtmean = replace(tmean, complete.cases(tmean), cummean(na.omit(tmean)))) |>
fill(cumtmean) |>
ungroup()
# # A tibble: 18 × 3
# season tmean cumtmean
# <chr> <dbl> <dbl>
# 1 Winter NA NA
# 2 Winter 2 2
# 3 Winter 3 2.5
# 4 Winter 4 3
# 5 Winter NA 3
# 6 Winter NA 3
# 7 Winter NA 3
# 8 Winter 8 4.25
# 9 Winter NA 4.25
# 10 Spring 7 7
# 11 Spring 8 7.5
# 12 Spring 9 8
# 13 Spring NA 8
# 14 Spring NA 8
# 15 Spring 5 7.25
# 16 Spring 3 6.4
# 17 Spring 2 5.67
# 18 Spring NA 5.67