I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset returns indexes corresponding to that date.
DATE | Asset_1 | Asset_2 | Asset_3 | Asset_4 |
---|---|---|---|---|
2000-01-01 | 1000 | 300 | 2900 | NA |
..... | ||||
2000-01-31 | 1100 | 350 | 2950 | NA |
2000-02-02 | 1200 | 330 | 2970 | 100 |
... | ||||
2000-02-28 | 1200 | 360 | 3000 | 200 |
2000-03-01 | 1200 | 370 | 3500 | 300 |
I want to make this into a monthly dataset by only keeping the first observation of the month.
I have come up with the following script:
library(dplyr)
library(lubridate)
monthly <- daily %>% filter(day(DATE) == 1)
However, the problem with this is that it doesnt work for months where the first day of the month is not a trading date (aka it is missing from the daily dataset).
So when I run the command, those months where the first day of the month doesn't exist are excluded from my dataset.
If the data is always ordered, you could group by year\month, then keep (slice) the first record from each group. Like:
df<-data.frame(mydate=as.Date("2023-01-01")+1:45)
library(tidyverse)
library(lubridate)
df %>%
group_by(ym=paste(year(mydate), month(mydate))) %>%
#group_by(year(mydate), month(mydate)) %>%
slice_head(n=1)