I'm trying to create a data column that lists only the hour each observation took place from time data that's formatted as HH:MM:SS in R.
I want to do this so I can group observations in my dataset by the hour they took place.
Example time vector: a <- c(22:00:03, 22:00:05, 22:00:07, 22:00:09)
Desired output: [1] 22 22 22 22
My dataset is very large so creating separate vectors with the observations based on hour feels time-consuming.
I've tried to explore the 'lubridate' function but I'm having trouble finding a solution.
Any help is appreciated!
You could use a regular expression to get the first two digits:
as.numeric(gsub("(^\\d{2}).*", "\\1", a))
# [1] 22 22 22 22
Convert it to POSIXlt and then format it to extract the hour:
as.numeric(format(strptime(a, "%H:%M"), "%H"))
# [1] 22 22 22 22
In lubridate you can convert it using hms()
, which will transform a character vector in the format HH:MM
to a period object and then extract the hour component:
library(lubridate)
hour(hms(a))
# [1] 22 22 22 22
The dttr2
package has a similar syntax to lubridate for this:
library(dttr2)
dtt_hour(dtt_time(a))
# [1] 22 22 22 22
Split it by :
and take the first element:
as.numeric(sapply(strsplit(a, ":", fixed = TRUE), `[[`, 1))
# [1] 22 22 22 22
stringr
implemented str_split_i()
in package version 1.5.0 that allows you to split and select the element by index:
library(stringr)
as.numeric(str_split_i(a, ":", 1))
# [1] 22 22 22 22
The strex
package can make common regular expressions easier to use to extract everything before the first pattern match:
library(strex)
as.numeric(str_before_first(a, ":"))
# [1] 22 22 22 22
Use the package datetime
to turn it into a time object, convert it to numeric (which will return seconds) and convert that to hours:
library(datetime)
as.numeric(as.time(a)) / (60 * 60)
# [1] 22 22 22 22