Search code examples
rdplyrgroupingsequence

How to count groupings of elements in base R or dplyr?


I am trying to count a rather complicated series of element groupings, using base R or dplyr.

Suppose we start with the output data frame myDF shown below. I would like to add a column "grpCnt" as manually inserted below and explained to the right. Please any suggestions for how to do this? I have fiddled around with seq() and dplyr::dense_rank() but I imagine there is a straightforward way to do this sort of thing.

> myDF
  Element Code  grpCnt  # grpCnt explanation
1       R  1.0     1.0  # first incidence of R so use Code value
2       R  2.1     2.1  # since there is a decimal value use Code value (2 integer means 2nd incidence of R, .1 decimal value means this Element has been previously grouped and is first in that group)
3       X  1.0     1.0  # first incidence of X so use Code value
4       X  2.1     2.1  # since there is a decimal value in Code use Code value
5       X  2.2     2.2  # since there is a decimal value in Code use Code value
6       X  3.1     3.1  # since there is a decimal value in Code use Code value
7       X  3.2     3.2  # since there is a decimal value in Code use Code value
8       X  5.0     4.0  # no decimal value in Code value; since X's above show prefix sequence of 1,2,2,3,3, continue this X with 4 (not 5)
9       B  1.0     1.0  # first incidence of B so use Code value

myDF <- data.frame(
  Element = c('R','R','X','X','X','X','X','X','B'),
  Code = c(1,2.1,1,2.1,2.2,3.1,3.2,5,1)
)

Output of myDF without above comments (code to generate this output is immediately above):

> myDF
  Element Code
1       R  1.0
2       R  2.1
3       X  1.0
4       X  2.1
5       X  2.2
6       X  3.1
7       X  3.2
8       X  5.0
9       B  1.0

Solution

  • library(dplyr)
    myDF %>%
      group_by(Element) %>%
      mutate(grpCnt = if_else(row_number() == 1 | Code %% 1 > 0, 
        Code, floor(lag(Code)) + 1)) %>%
      ungroup()
    
    
    # A tibble: 9 × 3
      Element  Code grpCnt
      <chr>   <dbl>  <dbl>
    1 R         1      1  
    2 R         2.1    2.1
    3 X         1      1  
    4 X         2.1    2.1
    5 X         2.2    2.2
    6 X         3.1    3.1
    7 X         3.2    3.2
    8 X         5      4  
    9 B         1      1