I have a df (chpt4
) with 1000+ participants, and the dates when tests where taken. I would like to accomodate the dates according to how many months have passed between the follow up (t1:t4
) and the baseline (t0
). For this purpose I created 4 additional columns (difft0t2:difft0t4
) that show exaclty the months elapsed between the tests. The image is what I have now.
I am grouping the months in 5 different categories: (also I thought this vectors would help me as a counter)
FU6 <- 1:9
FU12 <- 10:18
FU24 <- 19:30
FU36 <- 31:42
FU48 <- 43:54
My original idea was to start indexing the values of the difft0t1
column, that belong to the above ranges using which()
which(chpt4$difft0t1 %in% c(FU6)) #this works
which(chpt4$difft0t1 %in% c(FU14)) #this doesn't work at all
...and use that outcome number, as an index of which element to paste into another column. Its just not working.
keeping with the image example from lines 243 and 244, I would like to outcome columns to look like this:
baseline | FU6 | FU12 | FU24 | FU36 | FU48 |
---|---|---|---|---|---|
2012-02-24 | NA | 2013-09-06 | 2014-02-21 | 2015-06-23 | NA |
2012-05-24 | NA | 2013-05-16 | NA | 2015-04-20 | 2016-05-12 |
I think you need this
library (tidyverse)
df %>% pivot_longer(cols = -id, names_to = "Test", values_to = "Dates") %>%
group_by(id) %>% mutate(new_col = as.numeric(round((Dates - first(Dates))/30,0))) %>%
mutate(new_col = case_when(new_col == 0 ~ "Baseline",
new_col %in% 1:9 ~ "FU6",
new_col %in% 10:18 ~ "FU12",
new_col %in% 19:30 ~ "FU24",
new_col %in% 31:42 ~ "FU36",
new_col %in% 43:54 ~ "FU48")) %>% filter(!is.na(new_col)) %>%
select(-Test) %>% pivot_wider(id_cols = "id", names_from = "new_col", values_from = "Dates", values_fn = min)
# A tibble: 4 x 6
# Groups: id [4]
id Baseline FU12 FU24 FU36 FU48
<chr> <date> <date> <date> <date> <date>
1 waa000 2012-10-04 2013-09-05 NA NA NA
2 waf84 2012-02-24 NA 2013-09-06 2015-06-23 NA
3 waq593 2012-05-24 2013-05-16 NA 2015-04-20 2016-05-12
4 wcu776 2013-01-24 2014-01-23 NA NA NA
NOTE whenever there will be two dates in one group, minimum/first of those will be displayed. FU6
category will automatically in picture once the appropriate data is used.
sample data used
dput(df)
> dput(df)
structure(list(id = c("waa000", "waf84", "waq593", "wcu776"),
t0 = structure(c(15617, 15394, 15484, 15729), class = "Date"),
t1 = structure(c(15953, 15954, 15841, 16093), class = "Date"),
t2 = structure(c(NA, 16122, 16545, NA), class = "Date"),
t3 = structure(c(NA, 16609, 16933, NA), class = "Date"),
t4 = structure(c(NA_real_, NA_real_, NA_real_, NA_real_), class = "Date")), row.names = c(NA,
-4L), class = "data.frame")
> df
id t0 t1 t2 t3 t4
1 waa000 2012-10-04 2013-09-05 <NA> <NA> <NA>
2 waf84 2012-02-24 2013-09-06 2014-02-21 2015-06-23 <NA>
3 waq593 2012-05-24 2013-05-16 2015-04-20 2016-05-12 <NA>
4 wcu776 2013-01-24 2014-01-23 <NA> <NA> <NA>