Search code examples
rdataframesumvectorization

Stepwise column sum in data frame based on another column in R


I have a data frame like this:

Team GF
A 3
B 5
A 2
A 3
B 1
B 6

Looking for output like this (just an additional column):

Team x avg(X)
A 3 0
B 5 0
A 2 3
A 3 2.5
B 1 5
B 6 3

avg(x) is the average of all previous instances of x where Team is the same. I have the following R code which gets the overall average, however I'm looking for the "step-wise" average.

    new_df <- df %>% group_by(Team) %>% summarise(avg_x = mean(x))

Is there a way to vectorize this while only evaluating the previous rows on each "iteration"?


Solution

  • You want the cummean() function from dplyr, combined with lag():

    df %>% group_by(Team) %>% mutate(avg_x = replace_na(lag(cummean(x)), 0))
    

    Producing the following:

    # A tibble: 6 × 3
    # Groups:   Team [2]
      Team      x avg_x
      <chr> <dbl> <dbl>
    1 A         3   0
    2 B         5   0
    3 A         2   3
    4 A         3   2.5
    5 B         1   5
    6 B         6   3
    

    As required.

    Edit 1:

    As @Ritchie Sacramento pointed out, the following is cleaner and clearer:

    df %>% group_by(Team) %>% mutate(avg_x = lag(cummean(x), default = 0))