Search code examples
rdplyrgroup-bysummarize

How to combine multiple summarize calls dplyr?


Given the df

ww <- data.frame(
    GM = c("A", "A", "A", "A", "A", "A",
            "B", "B", "B", "B", "B", "B",
            "C", "C", "C", "C", "C", "C"),
    stanza = rep(c("Past", "Mid", "End"), 6),   
    change = c(1, 1.1, 1.4, 1, 1.3, 1.5, 1, 1.2, 1.4,
               1.1, 1.2, 1.3, .9, 1.2, 1.3, .9, 1.3, 1.5))

I would like to calculate the mean for Past for each GM and the divide each value in 'change' by the GM specific mean. I can do this with two dplyr calls and a join function as follows:

past <- ww %>%
    group_by(GM) %>%
    filter(stanza == "Past") %>%
    summarize(past.mean = mean(change))

ww <- left_join(ww, past, by = "GM")

ww %>%
    group_by(GM, stanza) %>%
    summarize(pr.change = change/past.mean)

But there must be a way to do this in one dplyr call.


Solution

  • Using base R

    transform(ww, pr.change = change/ave(replace(change,
        stanza != 'Past', NA), GM, FUN = function(x) mean(x, na.rm = TRUE)))
    

    -output

     GM stanza change pr.change
    1   A   Past    1.0  1.000000
    2   A    Mid    1.1  1.100000
    3   A    End    1.4  1.400000
    4   A   Past    1.0  1.000000
    5   A    Mid    1.3  1.300000
    6   A    End    1.5  1.500000
    7   B   Past    1.0  0.952381
    8   B    Mid    1.2  1.142857
    9   B    End    1.4  1.333333
    10  B   Past    1.1  1.047619
    11  B    Mid    1.2  1.142857
    12  B    End    1.3  1.238095
    13  C   Past    0.9  1.000000
    14  C    Mid    1.2  1.333333
    15  C    End    1.3  1.444444
    16  C   Past    0.9  1.000000
    17  C    Mid    1.3  1.444444
    18  C    End    1.5  1.666667