Search code examples
rdataframedplyrgroupingsummarize

R - calculate value by each seperate instance of a variable in a data frame


I have a data frame with multiple instances of variables and their durations. I would like to combine each separate instance of each variable and calculate their duration.

I've provided an example and expected outcome below.

  • Example data frame:
x <- c("w", "i", "i", "w", "w", "w", "i")
duration <- c(3,4,4,2,1,5,6)

xd <- cbind(x,duration)
  • What I would like the output to be is:
x_group <- c("w", "i", "w", "i")
duration_group <- c(3,8,8,6)

xd_wanted_outcome<-cbind(x_group,duration_group)

I have tried using group_by - but it groups all instances together.


Solution

  • xd <- data.frame(x,duration)
    
    xd |>
      mutate(grp = consecutive_id(x)) |>
      # mutate(grp = cumsum(x != lag(x, 1, ""))) |> # alternative option
      summarize(duration = sum(duration), .by = c(x, grp)) |>
      select(-grp)
    
      x duration
    1 w        3
    2 i        8
    3 w        8
    4 i        6