I have a dataset that looks like this but with much more t_i columns:
df <- data.frame("id" = c(1,2,3,4), "date" = c(4,8,2,5),
"t_1" = c(NA,9,NA,4), "t_2" = c(NA,10,NA,5),
"t_3" = c(NA,12,NA,6),"t_4" = c(NA,16,NA,8))
# id date t_1 t_2 t_3 t_4
# 1 4 NA NA NA NA
# 2 8 9 10 12 NA
# 3 2 NA NA NA NA
# 4 5 4 5 6 8
In my data, for each individuals "id", I have a date in the column "date" and I have other dates in columns "t_1,...,t_x" that are in chronological order. When one t_i is NA then all subsequent t_i+1,...,t_x are NA (non available). In the example data above, I chose to replace the dates by numeric values to keep it simple.
What I want: create a column to my data, let's say "entry" that takes the first value of the columns t_1, ...,t_x for which "date"<"t_i" else NA.
In my example, the result would be:
df <- data.frame("id" = c(1,2,3,4), "date" = c(4,8,2,5),
"entry"=c(NA,9,NA,6),"t_1" = c(NA,9,NA,4),
"t_2" = c(NA,10,NA,5),"t_3" = c(NA,12,NA,6),
"t_4" = c(NA,NA,NA,8))
# id date entry t_1 t_2 t_3 t_4
# 1 4 NA NA NA NA NA
# 2 8 9 9 10 12 NA
# 3 2 NA NA NA NA NA
# 4 5 6 4 5 6 8
I have the feeling that I could use across
or if_any
or if_all
in dplyr mutate but I can't find the right code! Your help would be more than welcome!
Thank you
You can first reshape your data into a "long" format, then use some indexing to get the correct "t_i" values for entry
by id
, then transform back to your "wide" format.
library(tidyverse)
df %>%
pivot_longer(c(-1, -2)) %>%
mutate(entry = min(value[date < value]), .by = id) %>%
pivot_wider()
# A tibble: 4 × 7
id date entry t_1 t_2 t_3 t_4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 NA NA NA NA NA
2 2 8 9 9 10 12 16
3 3 2 NA NA NA NA NA
4 4 5 6 4 5 6 8