Search code examples
rdata-management

Assigning group ID to different number of rows each time in R


I have some data where a different number of rows corresponds to one ID (e.g. person). Here is what that looks like:

label      | response
-----------+------------
'consent'  |  'yes'
'age'      |  '34'
'gender'   |  'female'
'language' |  'english'
'education'|  'college'
'consent'  |  'yes'
'age '     |  '37'
'gender '  |  'male'
'language' |  'english'
'education'|  'high school'
'race'     |  'white'

These responses correspond to two people, one who included a response on race while the other didn't. Since everyone has an answer to consent, I was wondering if there was a way to assign person ID conditional on the consent label. E.g. if label=consent, assign same ID for every row until next consent. For example, I would want the data to look like this:

label      | response   |  ID
-----------+------------+------
'consent'  |  'yes'     |  1 
'age'      |  '34'      |  1
'gender'   |  'female'  |  1
'language' |  'english' |  1
'education'|  'college' |  1
'consent'  |  'yes'     |  2
'age '     |  '37'      |  2
'gender '  |  'male'    |  2
'language' |  'english' |  2
'education'|  'HS'      |  2
'race'     |  'white'   |  2

I've tried a number of for loops and if statements, but haven't figured out a way to do it. Hopefully it's possible to do.

Thanks!


Solution

  • Does this work:

    library(dplyr)
    df %>% mutate(ID = cumsum(label == 'consent'))
    # A tibble: 11 x 3
       label     response       ID
       <chr>     <chr>       <int>
     1 consent   yes             1
     2 age       34              1
     3 gender    female          1
     4 language  english         1
     5 education college         1
     6 consent   yes             2
     7 age       37              2
     8 gender    male            2
     9 language  english         2
    10 education high school     2
    11 race      white           2