Creating hour column from HH:MM:SS data in R

I'm trying to create a data column that lists only the hour each observation took place from time data that's formatted as HH:MM:SS in R.

I want to do this so I can group observations in my dataset by the hour they took place.

Example time vector: a <- c(22:00:03, 22:00:05, 22:00:07, 22:00:09)
Desired output: [1] 22 22 22 22

My dataset is very large so creating separate vectors with the observations based on hour feels time-consuming.

I've tried to explore the 'lubridate' function but I'm having trouble finding a solution.

Any help is appreciated!

Solution

You could use a regular expression to get the first two digits:

as.numeric(gsub("(^\\d{2}).*", "\\1", a))
# [1] 22 22 22 22

Convert it to POSIXlt and then format it to extract the hour:

as.numeric(format(strptime(a, "%H:%M"), "%H"))
# [1] 22 22 22 22

In lubridate you can convert it using hms(), which will transform a character vector in the format HH:MM to a period object and then extract the hour component:

library(lubridate)

hour(hms(a))
# [1] 22 22 22 22

The dttr2 package has a similar syntax to lubridate for this:

library(dttr2)

dtt_hour(dtt_time(a))
# [1] 22 22 22 22

Split it by : and take the first element:

as.numeric(sapply(strsplit(a, ":", fixed = TRUE), `[[`, 1))
# [1] 22 22 22 22

stringr implemented str_split_i() in package version 1.5.0 that allows you to split and select the element by index:

library(stringr)

as.numeric(str_split_i(a, ":", 1))
# [1] 22 22 22 22

The strex package can make common regular expressions easier to use to extract everything before the first pattern match:

library(strex)

as.numeric(str_before_first(a, ":"))
# [1] 22 22 22 22

Use the package datetime to turn it into a time object, convert it to numeric (which will return seconds) and convert that to hours:

library(datetime)

as.numeric(as.time(a)) / (60 * 60)
# [1] 22 22 22 22