Search code examples
rdplyrlubridate

Doesn't detect dates


I am doing some operations with the functions isoweek and if_else but some dates do not perform the operation obtaining NA.

enter image description here

db = read_dta("C:/Users/crist/Downloads/db.dta")


db$date1 = substr(db$date1,1,10)
db$date1 = as.Date(db$date1)
db$date2 = substr(db$date2,1,10)
db$date2 = as.Date(db$date2)

t = db %>% 
  drop_na(value) %>% 
  mutate(Weekday = weekdays(date2),
         date2 = replace(date2, Weekday %in% c("Saturday", "Sunday"), NA),
         num_week = isoweek(date2),
         dummy_sas = if_else(date2 >= ymd("2020-05-18"),1,0)) %>% 
  fill(date2)  %>% 
  select(-Weekday) %>% 
  mutate(date2 = if_else(date2 < as.Date("2020-05-18") & SAS == 1, date1, date2))


I need for example all the week numbers but i get NA's

> summary(isoweek(t$date2))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   22.00   31.00   28.15   39.00   47.00 
> summary(t$num_week)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1.00   21.00   31.00   28.12   39.00   47.00     109 

Data here


Solution

  • I haven't downloaded the data but I think order of operation seems to be an issue here. Try this :

    library(dplyr)
    library(lubridate)
    
    db %>% 
      drop_na(value) %>% 
      mutate(Weekday = weekdays(date2),
             num_week = isoweek(date2),
             date2 = replace(date2, Weekday %in% c("Saturday", "Sunday"), NA),
             dummy_sas = as.integer(date2 >= ymd("2020-05-18"))) %>%
      fill(date2)  %>% 
      select(-Weekday) %>% 
      mutate(date2 = if_else(date2 < as.Date("2020-05-18") & SAS == 1, date1, date2))
    

    In your code you replace weekends with NA :

    date2 = replace(date2, Weekday %in% c("Saturday", "Sunday"), NA)
    

    and then take isoweek on date2.

    num_week = isoweek(date2)
    

    So weekends would give NA in num_week.

    You replace date2 again in last line

    mutate(date2 = if_else(date2 < as.Date("2020-05-18") & SAS == 1, date1, date2))
    

    so you don't get any NAs in t$date2.