Search code examples
rdplyr

R: How to calculate cumulative mean over first N rows and fill result down column?


I'd like to create a new column that computes a cumulative mean over the first 3 observations of another column and fill the result down the remainder of the new column. For example, let's say my data frame has 10 rows, and the cumulative mean of the first 3 observations was 64, I'd preferably like the value 64 filled in for rows 4 to 10.

Hoping for a solution that can be performed in dplyr.

library(tidyverse)

set.seed(1)
dat <- data.frame(var1 = round(rnorm(10, 100, 20)))

dat <- dat %>%
  mutate(var1_mean = cummean(head(var1, 3)))

Solution

  • One way:

    dat |>
      mutate(var1_mean = if_else(row_number() <= 3, cummean(var1), NA)) |>
      fill(var1_mean)
    

    Result

       var1 var1_mean
    1    87  87.00000
    2   104  95.50000
    3    83  91.33333
    4   132  91.33333
    5   107  91.33333
    6    84  91.33333
    7   110  91.33333
    8   115  91.33333
    9   112  91.33333
    10   94  91.33333