Search code examples
rdatetimetimesumposixct

Sum POSIXct POSIXt values from a column


I have one data frame with more than 1000 rows, and I have one column duracao which I want to sum the values that have there HH:MM, and resulting also the value as HH:MM.

df <- data.frame(duracao = c("00:29", "00:26", "00:38"))

#The sum should be 01:33

First I tried:

class(df$duracao)
>character

df$duracao<-as.numeric(df$duracao)

df <- df %>%
  mutate(duracao = seconds_to_period(duracao * 86400))

df <- df %>% #aqui
  mutate(duracao = sprintf("%02d:%02d", hour(duracao), minute(duracao)))

total_duracao <- sum(df$duracao)

Then second try:

df$duracao <- as.POSIXct(df$duracao, format = "%H:%M", tz = "UTC")
df <- df %>%
  mutate(duracao_sec = as.numeric(duracao - as.POSIXct("1970-01-01", tz = "UTC")))
total_seconds <- sum(df$duracao_sec)                                  
total_hours <- total_seconds %/% 3600
total_minutes <- (total_seconds %% 3600) %/% 60
total_duracao <- sprintf("%02d:%02d", total_hours, total_minutes)
print(total_duracao)

Update

I found these weird values in the column

I ran this code

which(!grepl("\\d{2}:\\d{2}", df$duracao))

And the values were:

 -1:-20 , -1:-00 and -1:-11

it should be 1:20, 1:00 and 1:11, how can I change it, please?


Solution

  • library(dplyr)
    library(lubridate)
    
    df |>
      mutate(duracao = gsub("-", "", duracao, fixed = TRUE),
             period = as.duration(hm(duracao))) |>
      summarize(period = seconds_to_period(sum(period)),
                hh_mm = paste0(sum(day(period * 24) + hour(period)), ":", minute(period)))
    #      period hh_mm
    # 1 1H 33M 0S  1:33
    

    If you don't want the values in a data frame then you can simply use this pipe chain (and ignore the library(dplyr) statement):

    library(lubridate)
    
    df$duracao |>
      gsub("-", "", x = _, fixed = TRUE) |>
      hm() |>
      as.duration() |>
      sum() |>
      seconds_to_period()
    

    Note that this returns a Period object. This makes it easy to extract components if you need:

    total_duracao <- df$duracao |>
      gsub("-", "", x = _, fixed = TRUE) |>
      hm() |>
      as.duration() |>
      sum() |>
      seconds_to_period()
    
    hour(total_duracao)
    # [1] 1
    
    minute(total_duracao)
    # [1] 33
    
    second(total_duracao)
    # [1] 0