Search code examples
rdataframegroupinguniquecumulative-sum

Assign ID to consecutive groups column r


I would like to produce a column in a data.frame that counts the consecutive id of the groups (s column in dummy df)

dummy_df = data.frame(s = c("a", "a", "b","b", "b", "c","c", "a", "a", "c", "c","a","a"),
                  desired_output= c(1,1,1,1,1,1,1,2,2,2,2,3,3))
dummy_df$rleid_output = rleid(dummy_df$s)
dummy_df
   s desired_output rleid_output
1  a              1            1
2  a              1            1
3  b              1            2
4  b              1            2
5  b              1            2
6  c              1            3
7  c              1            3
8  a              2            4
9  a              2            4
10 c              2            5
11 c              2            5
12 a              3            6
13 a              3            6

I would say it's similar to what rleid() does but restarting the counting when a new group is seen. However, I can't find a way to do it in such straight way. Thanks.


Solution

  • You can do:

    dummy_df$out <- with(rle(dummy_df$s), rep(ave(lengths, values, FUN = seq_along), lengths))
    

    Result:

       s desired_output out
    1  a              1   1
    2  a              1   1
    3  b              1   1
    4  b              1   1
    5  b              1   1
    6  c              1   1
    7  c              1   1
    8  a              2   2
    9  a              2   2
    10 c              2   2
    11 c              2   2
    12 a              3   3
    13 a              3   3