Search code examples
rdata-management

Remove incomplete months from a data frame even when part of the month contains data


I would like to remove incomplete months from my data frame even if some of the month has data.

Example data frame:

date <- seq.Date(as.Date("2016-01-15"),as.Date("2016-09-19"),by="day")
data <- seq(1:249)

df <- data.frame(date,data)

What I would like:

date2 <- seq.Date(as.Date("2016-02-01"),as.Date("2016-08-31"),by="day")
data2 <- seq(from = 18, to = 230)

df2 <- data.frame(date2,data2)

Solution

  • If I interpreted your question correctly, you want to be able to select the months that have a complete number of days, removing those that don't.

    The following uses dplyr v0.7.0:

    library(dplyr)
    
    df <- df %>%
      mutate(mo = months(date)) # add month (mo)
    
    complete_mo <- df %>%
      count(mo) %>% #count number of days in month (n)
      filter(n >= 28) %>% #rule of thumb definition of a `complete month`
      pull(mo)
    
    df_complete_mo <- df %>%
      filter(mo %in% complete_mo) %>% # here is where you select the complete months
      select(-mo) #remove mo, to keep your original df
    

    Then df_complete_mo yields your dataset with just complete months.