Hi everyone I have this dataset
library(dplyr)
library(tidyr)
input<- frame_data(
~member_id, ~fill_date , ~drug, ~days_supply,
"603", "02/17/2005", "a", 30,
"603", "06/13/2005", "a", 30,
"603", "08/11/2005", "a", 30,
"603", "06/12/2006", "b", 15,
"603", "05/09/2006", "b", 30
)
I am trying to create a variable called "time" which indicates the number of time the variable "drug" appears. So the output should look like this
output<- frame_data(
~member_id, ~fill_date , ~drug, ~days_supply, ~time,
"603", "02/17/2005", "a", 30, 1,
"603", "06/13/2005", "a", 30, 2,
"603", "08/11/2005", "a", 30, 3,
"603", "06/12/2006", "b", 15, 1,
"603", "05/09/2006", "b", 30 2
)
in other terms I'am looking for a sort of loop that can reset every time the "drug" variable changes I've tried this code
time<-1
i<-2
j<-1
while (i <=nrow(input)){
if (input[i,3,drop=]==input[i-1,3,drop=]){
j<-i
time<-c(time,j)
}else{
j<-1
time<-c(time,j)
}
i<-i+1
}
but of course it dosen't work since the i could not be reset because it indicates the row index while verifying the condition in the same times.
Thank you for your help
You're using dplyr
, so use group_by
, not a loop.
input %>% group_by(drug) %>% mutate(time = 1:n())
Probably you want to add member_id
to the group by as well, but since you don't mention I don't include. If so, just group_by(drug, member_id)
instead of group_by(drug)
.