Search code examples
rdplyrgrouping

How to count cumulative number of implied groupings in a single column of a dataframe in base R or dplyr?


Suppose we start with this data frame myDF generated by the code immediately beneath:

> myDF
  index
1     2
2     2
3     4
4     4
5     6
6     6
7     6

Generating code: myDF <- data.frame(index = c(2,2,4,4,6,6,6))

I'd like to add a column cumGrp to data frame myDF that provides a cumulative count of implicitly grouped elements, as illustrated below. Any suggestions of simple concise base R or dplyr code to do this?

> myDF
  index cumGrp   cumGrp explained
1     2      1   1st grouping of same index numbers (2) adjacent to each other
2     2      1   Same as above
3     4      2   2nd grouping of same index numbers (4) adjacent to each other
4     4      2   Same as above
5     6      3   3rd grouping of same index numbers (6) adjacent to each other
6     6      3   Same as above
7     6      3   Same as above

Solution

  • Many possible ways:

    dplyr::cur_group_id

    library(dplyr)
    myDF %>% 
      group_by(index) %>% 
      mutate(cumGrp = cur_group_id())
    

    cumsum

    library(dplyr)
    myDF %>% 
      mutate(cumGrp = cumsum(index != lag(index, default = 0)))
    

    as.numeric + factor

    myDF |>
      transform(cumGrp = as.numeric(factor(index)))
    

    data.table::.GRP

    library(data.table)
    setDT(myDF)[, num := .GRP, by = index]
    

    match

    myDF |>
      transform(cumGrp = match(index, unique(index))) 
    

    collapse::group

    library(collapse)
    myDF |>
      settransform(cumGrp = group(index))