Search code examples
rtimeformatlubridate

Creating hour column from HH:MM:SS data in R


I'm trying to create a data column that lists only the hour each observation took place from time data that's formatted as HH:MM:SS in R.

I want to do this so I can group observations in my dataset by the hour they took place.

Example time vector: a <- c(22:00:03, 22:00:05, 22:00:07, 22:00:09)
Desired output: [1] 22 22 22 22

My dataset is very large so creating separate vectors with the observations based on hour feels time-consuming.

I've tried to explore the 'lubridate' function but I'm having trouble finding a solution.

Any help is appreciated!


Solution

  • You could use a regular expression to get the first two digits:

    as.numeric(gsub("(^\\d{2}).*", "\\1", a))
    # [1] 22 22 22 22
    

    Convert it to POSIXlt and then format it to extract the hour:

    as.numeric(format(strptime(a, "%H:%M"), "%H"))
    # [1] 22 22 22 22
    

    In lubridate you can convert it using hms(), which will transform a character vector in the format HH:MM to a period object and then extract the hour component:

    library(lubridate)
    
    hour(hms(a))
    # [1] 22 22 22 22
    

    The dttr2 package has a similar syntax to lubridate for this:

    library(dttr2)
    
    dtt_hour(dtt_time(a))
    # [1] 22 22 22 22
    

    Split it by : and take the first element:

    as.numeric(sapply(strsplit(a, ":", fixed = TRUE), `[[`, 1))
    # [1] 22 22 22 22
    

    stringr implemented str_split_i() in package version 1.5.0 that allows you to split and select the element by index:

    library(stringr)
    
    as.numeric(str_split_i(a, ":", 1))
    # [1] 22 22 22 22
    

    The strex package can make common regular expressions easier to use to extract everything before the first pattern match:

    library(strex)
    
    as.numeric(str_before_first(a, ":"))
    # [1] 22 22 22 22
    

    Use the package datetime to turn it into a time object, convert it to numeric (which will return seconds) and convert that to hours:

    library(datetime)
    
    as.numeric(as.time(a)) / (60 * 60)
    # [1] 22 22 22 22