I am working with a data set of patients' health state over time. I would like to compute the data frame of transitions from the current health state to the next health state.
Here is an example where the health state is measured only by AFP level and weight. The health state measurements might look like the following:
x <- data.frame(id = c(1, 1, 1, 2, 2, 2),
day = c(1, 2, 3, 1, 2, 3),
event = c('status', 'status', 'death', 'status', 'status', 'status'),
afp = c(10, 50, NA, 20, 30, 40),
weight = c(100, 105, NA, 200, 200, 200))
The desired output looks like the following:
y <- data.frame(id = c(1, 1, 2, 2),
current_afp = c(10, 50, 20, 30),
current_weight = c(100, 105, 200, 200),
next_event = c('status', 'death', 'status', 'status'),
next_afp = c(50, NA, 30, 40),
next_weight = c(105, NA, 200, 200))
One inefficient way to obtain the output is:
Is there a more efficient way to obtain the output?
Note: The real measurements data frame can have more than 10 columns, so it is not very efficient from a lines of code perspective to explicitly write
current_afp = x$afp[1:(n-1)],
next_afp = x$afp[2:n]
...
and so on.
You could try:
library(dplyr)
x %>%
mutate_each(funs(lead(.)), -id, -day) %>%
full_join(x, ., by = c("id", "day")) %>%
select(-event.x) %>%
setNames(c(names(.)[1:2],
paste0("current_", sub("\\..*","", names(.)[3:4])),
paste0("next_", sub("\\..*","", names(.)[5:7])))) %>%
group_by(id) %>%
filter(day != last(day))
Which gives:
# id day current_afp current_weight next_event next_afp next_weight
#1 1 1 10 100 status 50 105
#2 1 2 50 105 death NA NA
#3 2 1 20 200 status 30 200
#4 2 2 30 200 status 40 200