Search code examples
rdataframesequencenested-loops

How to reset a sequence in a R dataframe's column, based on the levels of a different column


I have data that logs subject and session information in columns like so:

> Subject <- c(rep("A", 6), rep("B", 7))
> Session <- c(rep(1, 3), rep(2, 3), rep(8, 4), rep(9, 3))
> df <- data.frame(Subject, Session)
> df
   Subject Session
1        A       1
2        A       1
3        A       1
4        A       2
5        A       2
6        A       2
7        B       8
8        B       8
9        B       8
10       B       8
11       B       9
12       B       9
13       B       9

The problem is that the raw data does not always have consistent session numbering (hence why subject B's first session starts at session 8 and not 1). Ideally, it would be helpful to have a column that resets the session numbering when each new subject appears. Like this:

   Subject Session New_Sess
1        A       1        1
2        A       1        1
3        A       1        1
4        A       2        2
5        A       2        2
6        A       2        2
7        B       8        1
8        B       8        1
9        B       8        1
10       B       8        1
11       B       9        2
12       B       9        2
13       B       9        2

I've attempted a solution with nested for loops and could not manage to get that to work. I suspect there must be a straightforward solution to this (it doesn't seem like it should be that difficult) but despite my best efforts, it has evaded me completely.

The actual dataset is quite large and cumbersome, so an automated solution to this problem would be helpful.


Solution

  • We may use match

    library(dplyr)
    df <- df %>%
       group_by(Subject) %>% 
       mutate(New_Sess = match(Session, unique(Session))) %>%
       ungroup
    

    -output

    df
    # A tibble: 13 × 3
       Subject Session New_Sess
       <chr>     <dbl>    <int>
     1 A             1        1
     2 A             1        1
     3 A             1        1
     4 A             2        2
     5 A             2        2
     6 A             2        2
     7 B             8        1
     8 B             8        1
     9 B             8        1
    10 B             8        1
    11 B             9        2
    12 B             9        2
    13 B             9        2
    

    Or with factor

    df <-  df %>%
         group_by(Subject) %>% 
         mutate(New_Sess = as.integer(factor(Session))) %>% 
         ungroup