Suppose I have data that looks like this:
Date time price minute FOMC Daily.Return
<date> <time> <dbl> <dbl> <fct> <dbl>
1 2005-01-03 16:00:00 120. 960 FALSE -1.24
2 2005-01-04 16:00:00 119. 960 FALSE -1.44
3 2005-01-05 16:00:00 118. 960 FALSE -0.354
4 2005-01-06 16:00:01 119. 960 FALSE 0.245
5 2005-01-07 15:59:00 119. 959 FALSE -0.328
6 2005-01-10 16:00:00 119. 960 FALSE 0.506
7 2005-01-11 16:00:00 118. 960 FALSE -0.279
8 2005-01-12 16:00:01 119. 960 FALSE 0.329
9 2005-01-13 16:00:00 118. 960 FALSE -0.787
10 2005-01-14 16:00:00 118. 960 FALSE 0.372
I want to summarize Daily.Return
per group using the FOMC
variable which is either TRUE or FALSE. That is easy with dplyr. I get the following:
daily.SPY %>% group_by(FOMC) %>%
summarise(Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return/100))
As expected, I get the following tibble back:
FOMC Mean Median Vol
<fct> <dbl> <dbl> <dbl>
1 FALSE 0.00551 5.24 14.9
2 TRUE 20.8 1.20 17.6
However, I would like to have a third row which would perform the same computations without grouping. It would compute the average, median and standard deviation for the whole sample, without conditioning on the group. What's the easiest way to do that within tidyverse
? Thanks!
One option is to just row bind a duplicate of the whole data where you mutate()
the FOMC
variable to "ALL"
so that you end up with that as a separate group when you group_by()
and summarise()
.
library(tidyverse)
set.seed(1)
daily.SPY <- tibble(
FOMC = factor(rep(c(T, F), each = 25)),
Daily.Return = c(cumsum(rnorm(25)), cumsum(rnorm(25)))
)
daily.SPY %>%
bind_rows(., mutate(., FOMC = "ALL")) %>%
group_by(FOMC) %>%
summarise(Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return/100))
#> # A tibble: 3 x 4
#> FOMC Mean Median Vol
#> <chr> <dbl> <dbl> <dbl>
#> 1 ALL 58.4 -6.57 32.3
#> 2 FALSE -80.3 -53.6 13.9
#> 3 TRUE 197. 151. 30.5
Created on 2022-01-11 by the reprex package (v2.0.1)