Search code examples
rdateposixlt

Concatenating Dates into a single Column


My dates in Submitted.on column are of different formats hence resorted to converting the various formats separately and then concatenating them into a new column using ifelse and as.Date. But the however when I use strptime to have the timestamp also in the new column it throws error and a warning message.

data$Submitted.on[0:5]
#[1] 02-06-02 0:00       03/30/2010 23:15:12 
#[3] 11-05-09 6:28       07/29/2009 23:07:38 
#[5] 07-10-05 0:00

vec1 = as.character(strptime(data$Submitted.on, "%m/%d/%Y %H:%M:%S"))
vec1[0:5]
#[1] NA                    "2010-03-30 23:15:12"
#[3] NA                    "2009-07-29 23:07:38"
#[5] NA     

vec2 = as.character(strptime(data$Submitted.on, "%m-%d-%y %H:%M"))
vec2[0:5]
#[1] "2002-02-06 00:00:00"  NA                   
#[3] "2009-11-05 06:28:00"  NA                   
#[5] "2005-07-10 00:00:00"

data['new_format']=as.Date(ifelse(is.na(vec1),vec2,vec1))

data[0:5,'new_format']

#[1] "2002-02-06" "2010-03-30" "2009-11-05" "2009-07-29"
#[5] "2005-07-10"

Using as.Date works great but when I use strptime for timestamp also , it gives warning message.

data['new_format']=strptime(ifelse(is.na(vec1),vec2,vec1),"%Y-%m-%d 
%H:%M:%S")

#Warning message:
#In `[<-.data.frame`(`*tmp*`, "new_format", value = list(sec = c(0,  :
#provided 11 variables to replace 1 variables

data[0:5,'new_format']
[1]  0 12  0 38  0

Any help on how to have the timestamp also will be of great help.


Solution

  • We can use parse_date_time from lubridate

    library(lubridate)
    parse_date_time(data$Submitted.on, guess_formats(data$Submitted.on, 
           c("mdy HMS", "mdy MS")))
    #[1] "2002-02-06 00:00:00 UTC" "2010-03-30 23:15:12 UTC" "2009-11-05 00:06:28 UTC" 
    #[4] "2009-07-29 23:07:38 UTC" "2005-07-10 00:00:00 UTC"
    

    Regarding the use of ifelse, we would advocate against this as strptime gives a POSIXlt class. So, instead of ifelse, one can use the indexing method

    v1 <- strptime(data$Submitted.on, "%m/%d/%Y %H:%M:%S", tz = "UTC")
    v1[is.na(v1)] <- strptime(data$Submitted.on[is.na(v1)], "%m-%d-%y %H:%M", tz = "UTC")
    v1
    #[1] "2002-02-06 00:00:00 UTC" "2010-03-30 23:15:12 UTC" "2009-11-05 06:28:00 UTC"
    #[4] "2009-07-29 23:07:38 UTC" "2005-07-10 00:00:00 UTC"
    

    data

    data <- structure(list(Submitted.on = c("02-06-02 0:00", "03/30/2010 23:15:12", 
    "11-05-09 6:28", "07/29/2009 23:07:38", "07-10-05 0:00")),
     .Names = "Submitted.on", row.names = c(NA, -5L), class = "data.frame")