I have some data where a different number of rows corresponds to one ID (e.g. person). Here is what that looks like:
label | response
-----------+------------
'consent' | 'yes'
'age' | '34'
'gender' | 'female'
'language' | 'english'
'education'| 'college'
'consent' | 'yes'
'age ' | '37'
'gender ' | 'male'
'language' | 'english'
'education'| 'high school'
'race' | 'white'
These responses correspond to two people, one who included a response on race while the other didn't. Since everyone has an answer to consent
, I was wondering if there was a way to assign person ID conditional on the consent label. E.g. if label=consent, assign same ID for every row until next consent. For example, I would want the data to look like this:
label | response | ID
-----------+------------+------
'consent' | 'yes' | 1
'age' | '34' | 1
'gender' | 'female' | 1
'language' | 'english' | 1
'education'| 'college' | 1
'consent' | 'yes' | 2
'age ' | '37' | 2
'gender ' | 'male' | 2
'language' | 'english' | 2
'education'| 'HS' | 2
'race' | 'white' | 2
I've tried a number of for loops and if statements, but haven't figured out a way to do it. Hopefully it's possible to do.
Thanks!
Does this work:
library(dplyr)
df %>% mutate(ID = cumsum(label == 'consent'))
# A tibble: 11 x 3
label response ID
<chr> <chr> <int>
1 consent yes 1
2 age 34 1
3 gender female 1
4 language english 1
5 education college 1
6 consent yes 2
7 age 37 2
8 gender male 2
9 language english 2
10 education high school 2
11 race white 2