Search code examples
rtimestamplubridateposixct

Converting character timestamp to date-time stamp in R; H:M:S keeps getting removed


I am trying to convert a character date-time stamp to a normalised date-time stamp in R but running into the same issue with a number different solutions. Here is a sample:-

timesdf<-structure(list(DateTime = c("2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                     "2021-02-20 00:00:00", "2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                     "2021-02-20 00:00:00", "2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                     "2021-02-20 00:00:00", "2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                     "2021-02-20 00:00:00", "2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                     "2021-02-20 00:00:00")), row.names = c(NA, 15L), class = "data.frame")


str(timesdf)
#'data.frame':  15 obs. of  1 variable:
#  $ DateTime: chr  "2021-02-20 00:00:00" "2021-02-20 00:00:00" "2021-02-20 00:00:00" "2021-02-20 00:00:00" ...

Here is are some of the solutions that I have tried:-

#lubridate solution 1
timesdf$DateTime<-ymd_hms(timesdf$DateTime)
timesdf
head(timesdf)
#    DateTime
#1 2021-02-20
#2 2021-02-20
#3 2021-02-20
#4 2021-02-20
#5 2021-02-20
#6 2021-02-20


#lubridate solution 2
timesdf$DateTime<-ymd_hms(timesdf$DateTime,tz=Sys.timezone())
timesdf
head(timesdf)
#    DateTime
#1 2021-02-20
#2 2021-02-20
#3 2021-02-20
#4 2021-02-20
#5 2021-02-20
#6 2021-02-20



#POSIXct solution 1
timesdf$DateTime<-as.POSIXct(timesdf$DateTime, "%Y/%m/%d %H:%M:%OS")
#Warning messages:
#1: In strptime(xx, f, tz = tz) : unknown timezone '%Y/%m/%d %H:%M:%OS'
#2: In as.POSIXct.POSIXlt(x) : unknown timezone '%Y/%m/%d %H:%M:%OS'
#3: In strptime(x, f, tz = tz) : unknown timezone '%Y/%m/%d %H:%M:%OS'
#4: In as.POSIXct.POSIXlt(as.POSIXlt(x, tz, ...), tz, ...) :
#  unknown timezone '%Y/%m/%d %H:%M:%OS'
head(timesdf)
#    DateTime
#1 2021-02-20
#2 2021-02-20
#3 2021-02-20
#4 2021-02-20
#5 2021-02-20
#6 2021-02-20


#POSIXct solution 2
timesdf$DateTime<-as.POSIXct(timesdf$DateTime, "%Y-%m-%d %H:%M:%0S")
#Warning messages:
#  1: In strptime(xx, f, tz = tz) : unknown timezone '%Y-%m-%d %H:%M:%S'
#2: In as.POSIXct.POSIXlt(x) : unknown timezone '%Y-%m-%d %H:%M:%S'
#3: In strptime(x, f, tz = tz) : unknown timezone '%Y-%m-%d %H:%M:%S'
#4: In as.POSIXct.POSIXlt(as.POSIXlt(x, tz, ...), tz, ...) :
#  unknown timezone '%Y-%m-%d %H:%M:%S'
head(timesdf)
#    DateTime
#1 2021-02-20
#2 2021-02-20
#3 2021-02-20
#4 2021-02-20
#5 2021-02-20
#6 2021-02-20

As you can see, the H:M:S part of the timestamp is being removed, which is not what I want. I usually use lubridate to normalise date-time stamps and if that (seldom) fails then I use the as.POSIXct function. But I have no idea why the H:M:S part is being removed.

This is likely a duplicate question but I haven't found anything obvious that was similar to my issue. Any pointers would be appreciated :)


Solution

  • As pointed out in the comments, the problem is just how the data are printed. To convince yourself, just try to add 1 to the variable created by POSIXct:

    timesdf<-structure(list(DateTime = c("2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                         "2021-02-20 00:00:00", "2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                         "2021-02-20 00:00:00", "2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                         "2021-02-20 00:00:00", "2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                         "2021-02-20 00:00:00", "2021-02-20 00:00:00", "2021-02-20 00:00:00", 
                                         "2021-02-20 00:00:00")), row.names = c(NA, 15L), class = "data.frame")
    
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    timesdf <- timesdf |> 
      mutate(times = as.POSIXct(DateTime))
    
    head(timesdf)
    #>              DateTime      times
    #> 1 2021-02-20 00:00:00 2021-02-20
    #> 2 2021-02-20 00:00:00 2021-02-20
    #> 3 2021-02-20 00:00:00 2021-02-20
    #> 4 2021-02-20 00:00:00 2021-02-20
    #> 5 2021-02-20 00:00:00 2021-02-20
    #> 6 2021-02-20 00:00:00 2021-02-20
    
    timesdf |> 
      mutate(times = times + 1) |> 
      head()
    #>              DateTime               times
    #> 1 2021-02-20 00:00:00 2021-02-20 00:00:01
    #> 2 2021-02-20 00:00:00 2021-02-20 00:00:01
    #> 3 2021-02-20 00:00:00 2021-02-20 00:00:01
    #> 4 2021-02-20 00:00:00 2021-02-20 00:00:01
    #> 5 2021-02-20 00:00:00 2021-02-20 00:00:01
    #> 6 2021-02-20 00:00:00 2021-02-20 00:00:01
    

    Created on 2021-09-16 by the reprex package (v2.0.1)

    The errors you get from your POSIXct command ("unknown time zone") is due to the fact that the second argument to the POSIXct function is tz, as you see from the code above you don't have to specify the format.