Search code examples
rdata-cleaningfinance

Deleting multiple rows in R based on a is.na condition


I am not entirely new to R but I could use a hand in this one :)

So, I have a dataset on around 14k different stocks over the past 20 years. Now, clearly, some of these stocks were not traded for the entire time period. I am trying to clean my data so that for any stock if in a particular month it was not traded at least one day, the entire month of data for this (only) stock will be deleted. That is, I want to keep the data for every stock only if it was traded every day in a particular month. I hope this is clear!

It will be really great if you could at least guide me to the packages/functions that I can use to do this filtering beautifully.

Thanks!

Example:

mydata <- structure(list(permno = c(10001L, 10001L, 10001L, 10001L, 10001L, 
10001L, 10001L, 10001L, 10001L, 10001L), date = structure(c(7306, 
7307, 7308, 7309, 7312, 7313, 7314, 7315, 7316, 7319), class = "Date"), 
    sic = c(4920, 4920, 4920, 4920, 4920, 4920, 4920, 4920, 4920, 
    4920), price = c(3.375, 3.33333333333, NA, 3.35416666667, 
    3.375, 3.375, 3.375, 3.375, 3.375, NA), cap = c(10347750, 
    10220000, 10220000, 10283875, 10347750, 10347750, 10347750, 
    10347750, 10347750, 10092250)), row.names = c(NA, 10L), class = "data.frame")

mydata

permno refers to a particular stock, and for each of them, I only want to keep the data for months in which there is no n/a.


Solution

  • Separate year, month and day from date. For each stock in each month keep only the data if the stock is traded everyday of the month.

    library(dplyr)
    library(lubridate)
    
    result <- mydata %>%
                mutate(year = year(date), 
                       month = month(date), 
                       day = day(date)) %>%
                group_by(permno, year, month) %>%
                filter(all(!is.na(price)))