Search code examples
rdatedata-cleaninglubridateutc

Standardizing Different Time Formats in Column


I have a data frame that looks like this:

data <- data.frame(PROG_START = c("Tuesday, October 1, 2019", "1682294400"))
> print(data)
                PROG_START
1 Tuesday, October 1, 2019
2               1682294400

As you can see, the PROG_START column has dates in 2 different formats (UTX timestamp and day of week, month, day, year). I would like to standardize the values in this column so that the format for all dates is day-month-year.

I've looked through Stack and haven't had much success in finding a solution. I did find one helpful post that led me to develop the following:

data1 <- data %>% 
  mutate(Dates = case_when(str_detect(PROG_START, '\\d{10}\\.\\d{3}') ~  PROG_START, TRUE ~ NA_character_  )) %>% 
  mutate(Dates = as.POSIXct(as.numeric(PROG_START), origin = '1970-01-01', tz = 'UTC')) %>% 
  mutate(PROG_START = anytime(PROG_START)) %>% mutate(PROG_START = coalesce(PROG_START, Dates)) %>% select(-Dates)

This successfully converted most of the UTX timestamps, however, a minority had the wrong year. For example, 1682294400 was ascribed the year 1682 instead of 2023. It also turned all other formatted dates (i.e., Tuesday, October 1, 2019) into NAs.

I then updated the above code to this:

data1 <- data %>% 
  mutate(PROG_START = case_when(
    str_detect(PROG_START, '\\d{10}\\.\\d{3}') ~ as.POSIXct(as.numeric(PROG_START), origin = '1970-01-01', tz = 'UTC'),
    TRUE ~ dmy(PROG_START)
  )) %>% 
  mutate(PROG_START = format(PROG_START, "%d-%m-%Y"))

However, this turned everything into NAs. Not sure how to proceed at the moment. I appreciate the help! Thank you!


Solution

  • fn <- function(dd){
       a <- lubridate::as_datetime(suppressWarnings(as.numeric(dd)))
       a[idx] <- lubridate::parse_date_time(dd[idx <- is.na(a)], "ABdY")
       a
     }
    
    fn(data$PROG_START)
    [1] "2019-10-01 UTC" "2023-04-24 UTC"