I am not entirely new to R but I could use a hand in this one :)
So, I have a dataset on around 14k different stocks over the past 20 years. Now, clearly, some of these stocks were not traded for the entire time period. I am trying to clean my data so that for any stock if in a particular month it was not traded at least one day, the entire month of data for this (only) stock will be deleted. That is, I want to keep the data for every stock only if it was traded every day in a particular month. I hope this is clear!
It will be really great if you could at least guide me to the packages/functions that I can use to do this filtering beautifully.
Thanks!
Example:
mydata <- structure(list(permno = c(10001L, 10001L, 10001L, 10001L, 10001L,
10001L, 10001L, 10001L, 10001L, 10001L), date = structure(c(7306,
7307, 7308, 7309, 7312, 7313, 7314, 7315, 7316, 7319), class = "Date"),
sic = c(4920, 4920, 4920, 4920, 4920, 4920, 4920, 4920, 4920,
4920), price = c(3.375, 3.33333333333, NA, 3.35416666667,
3.375, 3.375, 3.375, 3.375, 3.375, NA), cap = c(10347750,
10220000, 10220000, 10283875, 10347750, 10347750, 10347750,
10347750, 10347750, 10092250)), row.names = c(NA, 10L), class = "data.frame")
mydata
permno refers to a particular stock, and for each of them, I only want to keep the data for months in which there is no n/a.
Separate year, month and day from date
. For each stock in each month keep only the data if the stock is traded everyday of the month.
library(dplyr)
library(lubridate)
result <- mydata %>%
mutate(year = year(date),
month = month(date),
day = day(date)) %>%
group_by(permno, year, month) %>%
filter(all(!is.na(price)))