Search code examples
rdplyrtidyversegroupingsummarize

What's the easiest way to summarize by group and for the whole sample?


Suppose I have data that looks like this:

   Date       time     price minute FOMC  Daily.Return
   <date>     <time>   <dbl>  <dbl> <fct>        <dbl>
 1 2005-01-03 16:00:00  120.    960 FALSE       -1.24 
 2 2005-01-04 16:00:00  119.    960 FALSE       -1.44 
 3 2005-01-05 16:00:00  118.    960 FALSE       -0.354
 4 2005-01-06 16:00:01  119.    960 FALSE        0.245
 5 2005-01-07 15:59:00  119.    959 FALSE       -0.328
 6 2005-01-10 16:00:00  119.    960 FALSE        0.506
 7 2005-01-11 16:00:00  118.    960 FALSE       -0.279
 8 2005-01-12 16:00:01  119.    960 FALSE        0.329
 9 2005-01-13 16:00:00  118.    960 FALSE       -0.787
10 2005-01-14 16:00:00  118.    960 FALSE        0.372

I want to summarize Daily.Return per group using the FOMC variable which is either TRUE or FALSE. That is easy with dplyr. I get the following:

daily.SPY %>%  group_by(FOMC) %>% 
  summarise(Mean = 100 * mean(Daily.Return),
            Median = 100 * median(Daily.Return),
            Vol = 100 * sqrt(252) * sd(Daily.Return/100))

As expected, I get the following tibble back:

FOMC      Mean Median   Vol
  <fct>    <dbl>  <dbl> <dbl>
1 FALSE  0.00551   5.24  14.9
2 TRUE  20.8       1.20  17.6

However, I would like to have a third row which would perform the same computations without grouping. It would compute the average, median and standard deviation for the whole sample, without conditioning on the group. What's the easiest way to do that within tidyverse? Thanks!


Solution

  • One option is to just row bind a duplicate of the whole data where you mutate() the FOMC variable to "ALL" so that you end up with that as a separate group when you group_by() and summarise().

    library(tidyverse)
    
    set.seed(1)
    
    daily.SPY <- tibble(
      FOMC = factor(rep(c(T, F), each = 25)),
      Daily.Return = c(cumsum(rnorm(25)), cumsum(rnorm(25)))
    )
    
    daily.SPY %>% 
      bind_rows(., mutate(., FOMC = "ALL")) %>%
      group_by(FOMC) %>% 
      summarise(Mean = 100 * mean(Daily.Return),
                Median = 100 * median(Daily.Return),
                Vol = 100 * sqrt(252) * sd(Daily.Return/100))
    #> # A tibble: 3 x 4
    #>   FOMC   Mean Median   Vol
    #>   <chr> <dbl>  <dbl> <dbl>
    #> 1 ALL    58.4  -6.57  32.3
    #> 2 FALSE -80.3 -53.6   13.9
    #> 3 TRUE  197.  151.    30.5
    

    Created on 2022-01-11 by the reprex package (v2.0.1)