Search code examples
rdplyrr-factor

Use grouped summary to operate in another data.frame column by factor


I want to compute a summary of a grouped data.frame, for example.

df_summ = mtcars %>% group_by(am) %>% summarise(mean_mpg=mean(mpg))

     am mean_mpg
  (dbl)    (dbl)
1     0 17.14737
2     1 24.39231

In order to later transform another data.frame that shares the same factor levels, but not the number of rows. For example, calculating the absolute difference from each group's mean of the single values.

Here's the toy example

toy=data.frame(am=c(1,1,0,0),mpg=c(1,2,3,4))

The calculation I would like to do would be y = abs(toy$mpg- df_summ$mean_mpg) by factor.

My head tells me dplyr must be able to do this but I can't come up with a way. I want to keep the original data.frame (as in, using mtcars %>% group_by(am) %>% mutate(...) )

The expected output looks like that

toy
  am mpg expected
1  1     1 23.39231
2  1     2 22.39231
3  0     3 14.14737
4  0     4 13.14737

Solution

  • Join the two data frames and then perform the calculation:

    toy %>% 
        left_join(df_summ) %>% 
        mutate(y = abs(mpg - mean_mpg))
    

    giving:

    Joining, by = "am"
      am mpg mean_mpg        y
    1  1   1 24.39231 23.39231
    2  1   2 24.39231 22.39231
    3  0   3 17.14737 14.14737
    4  0   4 17.14737 13.14737