Search code examples
rdplyrrow-number

Generate column id


I am working with log data; trying to find the round number of each event. The start of a round is signaled by action=="start". I want to create a "action.round" columns that tells me which round each event corresponds to.

I have data such this:

    data <- read_table2("Id action 
A   start
A   na
A   start
A   na
A   na
A   na
A   na
A   start
B   start
B   na
B   start
B   na
B   start
B   na" 

I am trying to create an output such as this:

output <- read_table2("Id   action  action.round
A   start   1
A   na  1
A   start   2
A   na  2
A   na  2
A   na  2
A   na  2
A   start   3
B   start   1
B   na  1
B   start   2
B   na  2
B   start   3
B   na  3")

So far, I have been able to get part of the output by using row_number(), like this:

` data %>% 
  mutate(round.start=case_when(actionValue=="start"~"start",TRUE~"NA")) %>%
  ungroup() %>%
  group_by(Id,round.start) %>%
  mutate(action.round=row_number())`

But now, I would like to fill the round number that corresponds to round.start=="start" into the column, so that I know which round number each column actually corresponds to (see desired output above).


Solution

  • You could use cumsum after grouping by Id.

    library(dplyr)
    data %>% group_by(Id) %>% mutate(action.round = cumsum(action == "start"))
    
    #   Id    action action.round
    #  <chr>  <chr>         <int>
    # 1 A     start            1
    # 2 A     na               1
    # 3 A     start            2
    # 4 A     na               2
    # 5 A     na               2
    # 6 A     na               2
    # 7 A     na               2
    # 8 A     start            3
    # 9 B     start            1
    #10 B     na               1
    #11 B     start            2
    #12 B     na               2
    #13 B     start            3
    #14 B     na               3
    

    This can be done in base R

    data$action.round <- with(data, ave(action == "start", Id, FUN = cumsum))
    

    and data.table as well

    library(data.table)
    setDT(data)[, action.round := cumsum(action == "start"), Id]
    

    data

    data <- structure(list(Id = c("A", "A", "A", "A", "A", "A", "A", "A", 
    "B", "B", "B", "B", "B", "B"), action = c("start", "na", "start", 
    "na", "na", "na", "na", "start", "start", "na", "start", "na", 
    "start", "na")), row.names = c(NA, -14L), spec = structure(list(
    cols = list(Id = structure(list(), class = c("collector_character", 
    "collector")), action = structure(list(), class = c("collector_character", 
    "collector")), action.round = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"), class = c("spec_tbl_df", 
    "tbl_df", "tbl", "data.frame"))