I want to make a long to a wide format and use the factor Levels as binary variables. This means, if the factor Level is existing at least once, then there should be a 1 in the variable. Otherwise a 0. In addition, I want the dates as variable values date.1, date.2,...
What I have is the following
data_sample <- data.frame(
PatID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L),
date = c("2016-12-14", "2017-02-04", "NA", "NA", "2012-27-03", "2012-04-21", "2010-02-03", "2011-03-05", "2014-08-25"),
status = c("COPD", "CPOD", "NA", "NA", "Cardio", "CPOD", "Cardio", "Cardio", "Cerebro")
)
What I want is:
PatID COPD Cardio Cerebro date.COPD.1 date.COPD.2 date.Cardio.1 date.Cardio.2 date.Cerebro.1
1 1 0 0 2016-12-14 2017-02-04 NA NA NA
2 0 1 0 NA NA 2012-03-27 NA NA
3 1 1 1 2012-04-21 NA 2010-02-03 2011-03-05 2014-08-25
There are a few step to take but this should give you your desired output.
Note however that there seems to be a typo in the input data: I assume you meant "COPD"
instead of "CPOD"
because this is what you expected output tells me.
The first step is to make the string "NA"
an explicit missing value, i.e. NA
.
data_sample[data_sample == "NA"] <- NA
Now use data.table::dcast
for the reshaping.
library(data.table)
setDT(data_sample)
# create id column
data_sample[, id := rowid(status), by = PatID]
dt1 <- dcast(data_sample[!is.na(date)], PatID ~ status, fun.aggregate = function(x) +any(x))
dt2 <- dcast(data_sample[!is.na(date)], PatID ~ paste0("date_", status) + id, value.var = "date")
Finally join both data.tables
out <- dt1[dt2, on = 'PatID']
out
# PatID Cardio Cerebro COPD date_COPD_1 date_COPD_2 date_Cardio_1 date_Cardio_2 date_Cerebro_1
#1: 1 0 0 1 2016-12-14 2017-02-04 <NA> <NA> <NA>
#2: 2 1 0 0 <NA> <NA> 2012-27-03 <NA> <NA>
#3: 3 1 1 1 2012-04-21 <NA> 2010-02-03 2011-03-05 2014-08-25
data
data_sample <- data.frame(
PatID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L),
date = c("2016-12-14", "2017-02-04", "NA", "NA", "2012-27-03", "2012-04-21", "2010-02-03", "2011-03-05", "2014-08-25"),
status =c("COPD", "COPD", "NA", "NA", "Cardio", "COPD", "Cardio", "Cardio", "Cerebro"))