I'm importing a large high resolution time series hydrometric dataset from government website as a csv file. The column 'x.Timestamp' is importing as chr with some unusual characters but when I try to convert it from chr to date, NA is returned. I would be really grateful for any help.
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
OW_HR_Discharge_DataV2<- data.frame(
stringsAsFactors = FALSE,
X.Timestamp = c("1 1955-11-08T00:00:00.000Z",
"2 1955-11-08T00:15:00.000Z","3 1955-11-08T00:30:00.000Z",
"4 1955-11-08T00:45:00.000Z",
"5 1955-11-08T01:00:00.000Z","6 1955-11-08T01:15:00.000Z"),
Value = c(10.227, 10.226, 10.228, 10.227, 10.227, 10.227),
Quality.Code = c(31L, 31L, 31L, 31L, 31L, 31L)
)
str(OW_HR_Discharge_DataV2)
#> 'data.frame': 6 obs. of 3 variables:
#> $ X.Timestamp : chr "1 1955-11-08T00:00:00.000Z" "2 1955-11-08T00:15:00.000Z" "3 1955-11-08T00:30:00.000Z" "4 1955-11-08T00:45:00.000Z" ...
#> $ Value : num 10.2 10.2 10.2 10.2 10.2 ...
#> $ Quality.Code: int 31 31 31 31 31 31
OW_HR_Discharge_DataV2$X.Timestamp<-as.POSIXct(OW_HR_Discharge_DataV2$X.Timestamp, format = "%Y-%m-%d %H:%M:%S", tz="GMT")
str(OW_HR_Discharge_DataV2)
#> 'data.frame': 6 obs. of 3 variables:
#> $ X.Timestamp : POSIXct, format: NA NA ...
#> $ Value : num 10.2 10.2 10.2 10.2 10.2
...
#> $ Quality.Code: int 31 31 31 31 31 31
As @Sophie said with other words, your csv's row-numbers appears to be at the beginning of OW_HR_Discharge_DataV2
. Remove it with stringr::str_remove()
and parse the timestamp with lubridate::ymd_hms()
:
library(tidyverse)
# ------------------
OW_HR_Discharge_DataV2 <- OW_HR_Discharge_DataV2 %>%
as_tibble() %>%
mutate(
new_timestamp = X.Timestamp %>%
str_remove("^\\d+\\s") %>%
ymd_hms(tz = "GMT"))
# Output ----------
> OW_HR_Discharge_DataV2
# A tibble: 6 × 4
X.Timestamp Value Quality.Code new_timestamp
<chr> <dbl> <int> <dttm>
1 1 1955-11-08T00:00:00.000Z 10.2 31 1955-11-08 00:00:00
2 2 1955-11-08T00:15:00.000Z 10.2 31 1955-11-08 00:15:00
3 3 1955-11-08T00:30:00.000Z 10.2 31 1955-11-08 00:30:00
4 4 1955-11-08T00:45:00.000Z 10.2 31 1955-11-08 00:45:00
5 5 1955-11-08T01:00:00.000Z 10.2 31 1955-11-08 01:00:00
6 6 1955-11-08T01:15:00.000Z 10.2 31 1955-11-08 01:15:00
Note1: You don't really need to make your data frame as_tibble()
but it makes sense in a tidyverse
solution.
Note2: when working with dates, make sure your timezone is in base::OlsonNames()
("GMT" is) and avoid problems.
Hope it helps.