Search code examples
rdplyrtime-seriestidyverselubridate

Group by weekly data and summarize by month in R with dplyr


I have a dataset of weekly mortgage rate data.

The data looks very simple:

library(tibble)
library(lubridate)

df <- tibble(
  Date = as_date(c("2/7/2008 ", "2/14/2008", "2/21/2008", "2/28/2008", "3/6/2008"), format = "%m/%d/%Y"),
  Rate = c(5.67, 5.72, 6.04, 6.24, 6.03)
)

I am trying to group it and summarize by month.

This blogpost and this answer are not what I want, because they just add the month column.

They give me the output:

month         Date     summary_variable
2008-02-01  2008-02-07  5.67        
2008-02-01  2008-02-14  5.72        
2008-02-01  2008-02-21  6.04        
2008-02-01  2008-02-28  6.24    

My desired output (ideally the last day of the month):

Month  Average rate
2/28/2008   6
3/31/2008   6.1
4/30/2008   5.9

In the output above I put random numbers, not real calculations.


Solution

  • We can get the month extracted as column and do a group by mean

    library(dplyr)
    library(lubridate)
    library(zoo)
    df1 %>%
      group_by(Month = as.Date(as.yearmon(mdy(DATE)), 1)) %>% 
      summarise(Average_rate = mean(MORTGAGE30US))
    

    -output

    # A tibble: 151 x 2
    #   Month      Average_rate
    #   <date>            <dbl>
    # 1 2008-02-29         5.92
    # 2 2008-03-31         5.97
    # 3 2008-04-30         5.92
    # 4 2008-05-31         6.04
    # 5 2008-06-30         6.32
    # 6 2008-07-31         6.43
    # 7 2008-08-31         6.48
    # 8 2008-09-30         6.04
    # 9 2008-10-31         6.2 
    #10 2008-11-30         6.09
    # … with 141 more rows