Search code examples
rdataframedatereshape2dcast

Casting a dataframe with datetime columns


I have built a dataframe with R reading some events from my email. Basically, the structure of the final dataframe is as follows:

'data.frame':   74 obs. of  7 variables:
 $ process_name : Factor w/ 2 levels : 1 1 1 1 2 2 2 2 2 2 ...
 $ job_code : chr  "TRB1619825404" "TRB1619825404" "TRB1619825404" "TRB1619825404" ...
 $ phase           : Factor w/ 7 levels ,..: 4 4 6 6 4 5 7 1 3 2 ...
 $ stage          : Factor w/ 2 levels "End","Start": 2 1 2 1 2 1 2 2 2 2 ...
 $ date          : POSIXct, format: "2021-04-30 23:30:04" "2021-05-01 01:57:26" "2021-05-01 01:57:26" "2021-05-01 02:25:26" ...
 $ execution_date: Date, format: "2021-04-30" "2021-05-01" "2021-05-01" "2021-05-01" ...
 $ execution_time : 'hms' num  23:30:04 01:57:26 01:57:26 02:25:26 ...
  ..- attr(*, "units")= chr "secs"

Every event has an associated date and time for its start and end. What I want to do (in order to compute the duration of the event) is to cast the dataframe in something like this:

'data.frame'
 $ process_name
 $ job_code
 $ phase
 $ start_date
 $ end_date
 $ duration         

I tried to use dcast, but it uses a default aggregation function and I just want to reshape the dataframe. Any ideas?


Solution

  • First, I'm guessing your data looks like this:

    dat <- structure(list(process_name = structure(c(1L, 1L, 1L, 1L), .Label = c("L01", "L02"), class = "factor"), job_code = c("TRB1619825404", "TRB1619825404", "TRB1619825404", "TRB1619825404"), phase = structure(c(3L, 3L, 5L, 5L), .Label = c("L02", "L03", "L04", "L05", "L06", "L07"), class = "factor"), stage = structure(c(2L, 1L, 2L, 1L), .Label = c("End", "Start" ), class = "factor"), date = structure(c(1619825404, 1619834246, 1619834246, 1619835926), class = c("POSIXct", "POSIXt"), tzone = "UTC"), execution_date = structure(c(18747, 18748, 18748, 18748), class = "Date"), execution_time = c("30:04", "01:57:26", "01:57:26", "02:25:26" )), row.names = c(NA, 4L), class = "data.frame")
    

    (If not, please provide your data in something we can use.)

    From here:

    out <- reshape2::dcast(dat[,1:5], process_name + job_code + phase ~ stage, value.var = "date")
    out[,c("End","Start")] <- lapply(out[,c("End","Start")], as.POSIXct, origin = "1970-01-01", tz = "UTC")
    out$duration <- difftime(out$End, out$Start, units = "mins")
    out
    #   process_name      job_code phase                 End               Start      duration
    # 1          L01 TRB1619825404   L04 2021-05-01 01:57:26 2021-04-30 23:30:04 147.3667 mins
    # 2          L01 TRB1619825404   L06 2021-05-01 02:25:26 2021-05-01 01:57:26  28.0000 mins
    

    I had to re-as.POSIXct it because dcast cast them to numeric (not sure how to fix that).