I have a dataframe (fbwb) with multiple assessments of bullying (1-6) using multiple measures (1-3) in a group of participants. The df looks like this:
fbwb <- read.table(text="id year bully1 bully2 bully3 cbully bully_ever
100 1 NA 1 NA 1 1
100 2 1 1 NA 1 1
100 3 NA 0 NA 0 1
101 1 NA NA 1 1 1
102 1 NA 1 NA 1 1
102 2 NA NA NA NA 1
102 3 NA 1 1 1 1
102 4 0 0 0 0 1
103 1 NA 1 NA 1 1
103 2 NA 0 0 0 1", header=TRUE)
Where bully1, bully2, and bully3 are binary variables that each = 1 if bullying was reported on the respective measure. cbully is binary and = 1 if any of the 3 bullying variables = 1 for a given year. bully_ever is binary and = 1 if bullying was reported on any measure in any year for a given participant.
I want to create a new binary variable in my df called bully_past. bully_past represents the case when cbully = 1 in ANY PAST YEAR. This is subtly different from bully_ever. For example, if a participant has been assessed 4 times:
I have tried quite a few things, but the most recent rendition is the following:
fbwb <- fbwb %>%
dplyr::group_by(id) %>%
dplyr::mutate(bully_past = case_when(cbully == 1 & year == (year - 1) |
cbully == 1 & year == (year - 2) |
cbully == 1 & year == (year - 3) |
cbully == 1 & year == (year - 4) |
cbully == 1 & year == (year - 5) ~ 1,
(is.na(cbully) & year == (year - 1) &
is.na(cbully) & year == (year - 2) &
is.na(cbully) & year == (year - 3) &
is.na(cbully) & year == (year - 4) &
is.na(cbully) & year == (year - 5)) ~ NA_real_,
TRUE ~ 0)) %>%
dplyr::ungroup()
This does not work because the syntax for indicating which years to use is not correct - so it generates a column of NA values. I have made other attempts, but I have not been able to manage to take into account observations from ALL PREVIOUS YEARS.
It can be done in Stata using this code:
gen bullyingever = bullying
sort iid time
replace bullyingever = 1 if bullying[_n - 1]==1 & iid[_n - 1]==iid
replace bullyingever = 1 if bullying[_n - 2]==1 & iid[_n - 2]==iid
replace bullyingever = 1 if bullying[_n - 3]==1 & iid[_n - 3]==iid
replace bullyingever = 1 if bullying[_n - 4]==1 & iid[_n - 4]==iid
replace bullyingever = 1 if bullying[_n - 5]==1 & iid[_n - 5]==iid
I appreciate any input on how to accomplish this in R, preferably using dplyr.
Here we can write a helper function that can look at previous events both using cumsum
(to keep a cumulative account of events which lets you look into the past) and lag()
in order to look exclusively behind the current value. So we have
had_previous_event <- function(x) {
lag(cumsum(!is.na(x) & x==1)>0)
}
You can then use that with your dplyr
chain
fbwb %>%
arrange(id, year) %>%
group_by(id) %>%
mutate(bully_past = had_previous_event(cbully))
This returns TRUE/FALSE but if you want zero/one you can change that to
mutate(bully_past = as.numeric(had_previous_event(cbully)))