Search code examples
rdatetimetype-conversionlubridate

Problem converting time series df from chr to date using as.POSIXct


I'm importing a large high resolution time series hydrometric dataset from government website as a csv file. The column 'x.Timestamp' is importing as chr with some unusual characters but when I try to convert it from chr to date, NA is returned. I would be really grateful for any help.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
OW_HR_Discharge_DataV2<- data.frame(
  stringsAsFactors = FALSE,
  X.Timestamp = c("1 1955-11-08T00:00:00.000Z",
                  "2 1955-11-08T00:15:00.000Z","3 1955-11-08T00:30:00.000Z",
                  "4 1955-11-08T00:45:00.000Z",
                  "5 1955-11-08T01:00:00.000Z","6 1955-11-08T01:15:00.000Z"),
  Value = c(10.227, 10.226, 10.228, 10.227, 10.227, 10.227),
  Quality.Code = c(31L, 31L, 31L, 31L, 31L, 31L)
)
str(OW_HR_Discharge_DataV2)
#> 'data.frame':    6 obs. of  3 variables:
#>  $ X.Timestamp : chr  "1 1955-11-08T00:00:00.000Z" "2 1955-11-08T00:15:00.000Z" "3 1955-11-08T00:30:00.000Z" "4 1955-11-08T00:45:00.000Z" ...
#>  $ Value       : num  10.2 10.2 10.2 10.2 10.2 ...
#>  $ Quality.Code: int  31 31 31 31 31 31
OW_HR_Discharge_DataV2$X.Timestamp<-as.POSIXct(OW_HR_Discharge_DataV2$X.Timestamp, format = "%Y-%m-%d %H:%M:%S", tz="GMT")
str(OW_HR_Discharge_DataV2)
#> 'data.frame':    6 obs. of  3 variables:
#>  $ X.Timestamp : POSIXct, format: NA NA ...
#>  $ Value       : num  10.2 10.2 10.2 10.2 10.2 
...
#>  $ Quality.Code: int  31 31 31 31 31 31

Solution

  • As @Sophie said with other words, your csv's row-numbers appears to be at the beginning of OW_HR_Discharge_DataV2. Remove it with stringr::str_remove() and parse the timestamp with lubridate::ymd_hms():

    library(tidyverse)
    
    # ------------------
    OW_HR_Discharge_DataV2 <- OW_HR_Discharge_DataV2 %>% 
      as_tibble() %>% 
      mutate(
        new_timestamp = X.Timestamp %>% 
          str_remove("^\\d+\\s") %>% 
          ymd_hms(tz = "GMT"))
    
    # Output ----------
    > OW_HR_Discharge_DataV2
    
    # A tibble: 6 × 4
      X.Timestamp                Value Quality.Code new_timestamp      
      <chr>                      <dbl>        <int> <dttm>             
    1 1 1955-11-08T00:00:00.000Z  10.2           31 1955-11-08 00:00:00
    2 2 1955-11-08T00:15:00.000Z  10.2           31 1955-11-08 00:15:00
    3 3 1955-11-08T00:30:00.000Z  10.2           31 1955-11-08 00:30:00
    4 4 1955-11-08T00:45:00.000Z  10.2           31 1955-11-08 00:45:00
    5 5 1955-11-08T01:00:00.000Z  10.2           31 1955-11-08 01:00:00
    6 6 1955-11-08T01:15:00.000Z  10.2           31 1955-11-08 01:15:00
    

    Note1: You don't really need to make your data frame as_tibble() but it makes sense in a tidyverse solution.

    Note2: when working with dates, make sure your timezone is in base::OlsonNames() ("GMT" is) and avoid problems.

    Hope it helps.