For a project I need to preproccess data from a hospital and eventually make a predictive model.
In one of my preproccessing steps, I need to make a column that represents the number of cumulative days a patient was in the hospital. This number is determined by looking at several other columns in different rows. Also, a patient can be hospitalized multiple times on different occasions. I'm sorry if this is very confusing.
I've added a picture of a dataframe. I want to know how I can make an argument with R to make the column cdays out of the column Patientid and Date. sample of my data
I've tried numerous ways to do this. Some were using for
and while
loops with counters. And others were using a nested ifelse
with new vectors (so I could compare current rows with rows of a iteration before):
#i-1 en i c.days
df$c.days <- 0
df$i_min_1c.days <- 0
#i en i+1 date
iDate<-df$Date[1:(nrow(df)-1)]
i_plus_1Date<-df$Date[2:(nrow(df))]
#i en i+1 patientid
iPatientid<-df$Patientid[1:(nrow(df)-1)]
i_plus_1Patientid<-df$Patientid[2:(nrow(df))]
newNew<-c(ifelse(iPatientid==i_plus_1Patientid, ifelse(i_plus_1Date-iDate>1,1,df$i_min_1c.days + 1), 1), df$c.days[nrow(df)])
Obviously this didn't work, but I was hopeless.... Could anyone point me in the right direction on how to proceed?
Some notes: - The complete dataframe is 800k rows long and it's 9 columns wide (keep in mind conversions will take a lot of time) - The value of cdays starts at 1 since it will be used as a multiplier, - If the date difference between the ith and i+1th is bigger than 1 day, it will be considered as a new session and the cdays value would be 1.
If you need any more information, feel free to ask. I will try my best! Thank you very much and I'm sorry for my bad English.
Given that you have a large dataset, use data.table
.
library(data.table)
setDT(df) #convert to data table
setorder(df, Patientid, Date) #am assuming your dates are R dates and not characters
df[, cdays := ifelse(
Date == shift(Date) + 1,
shift(cdays) + 1,
1
),
by=Patientid]
The question is vague enough that I may easily have misunderstood it.