I have a dataframe with two columns as below.
nct_id recrstdt check start_date sdt
NCT02277743 2014-12-21 2 2014-10-31 16425
NCT02277769 2014-12-21 2 2014-11-30 16425
NCT03131648 2017-06-08 2 2017-05-30 17325
NCT03160885 2017-07-14 2 2017-06-12 17361
NCT03349060 2018-01-17 2 2017-12-07 17548
NCT03569293 2018-08-21 2 2018-08-13 17764
NCT03575871 2018-07-12 2 2018-06-29 17724
NCT03607422 2018-08-09 2 2018-07-27 17752
NCT04146363 <NA> 2 2019-09-24 Inf
NCT04162769 <NA> 2 2019-10-04 Inf
NCT04178967 <NA> 2 2019-10-29 Inf
structure(list(nct_id = c("NCT02277743", "NCT02277769", "NCT03131648",
"NCT03160885", "NCT03349060", "NCT03569293", "NCT03575871", "NCT03607422",
"NCT04146363", "NCT04162769", "NCT04178967"), recrstdt = structure(c(16425,
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
), class = "Date"), check = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
), start_date = structure(c(16374, 16404, 17316, 17329, 17507,
17756, 17711, 17739, 18163, 18173, 18198), class = "Date"), sdt = c(16425,
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
)), row.names = c(NA, -11L), class = "data.frame")
I am using the following code to create column sdt which picks the first non-missing from recstdt or start_date in that order. However when recrstdt is NA , start_date is not being picked.
Class of recrstdt is "Date"
mutate(check=ifelse(is.na(recrstdt), 1, 2),
sdt=ifelse(!is.na(recrstdt), recrstdt, start_date))
Any help to figure this out is much appreciated.
Since is.finite(NA) == FALSE
, we can take account for both Inf
or NA
. Gives Inf
if both are Inf
or NA
.
> dat |>
+ transform(check=replace(check, !is.finite(recrstdt), 1),
+ sdt=as.Date(mapply(min, recrstdt, start_date, na.rm=TRUE)))
nct_id recrstdt check start_date sdt
1 NCT02277743 2014-12-21 2 2014-10-31 2014-10-31
2 NCT02277769 2014-12-21 2 2014-11-30 2014-11-30
3 NCT03131648 2017-06-08 2 2017-05-30 2017-05-30
4 NCT03160885 2017-07-14 2 2017-06-12 2017-06-12
5 NCT03349060 2018-01-17 2 2017-12-07 2017-12-07
6 NCT03569293 2018-08-21 2 2018-08-13 2018-08-13
7 NCT03575871 2018-07-12 2 2018-06-29 2018-06-29
8 NCT03607422 2018-08-09 2 2018-07-27 2018-07-27
9 NCT04146363 Inf 1 2019-09-24 2019-09-24
10 NCT04162769 Inf 1 Inf Inf
11 NCT04178967 <NA> 1 2019-10-29 2019-10-29
Data:
> dput(dat)
structure(list(nct_id = c("NCT02277743", "NCT02277769", "NCT03131648",
"NCT03160885", "NCT03349060", "NCT03569293", "NCT03575871", "NCT03607422",
"NCT04146363", "NCT04162769", "NCT04178967"), recrstdt = structure(c(16425,
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, NA
), class = "Date"), check = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
), start_date = structure(c(16374, 16404, 17316, 17329, 17507,
17756, 17711, 17739, 18163, Inf, NA), class = "Date"), sdt = c(16425,
16425, 17325, 17361, 17548, 17764, 17724, 17752, Inf, Inf, Inf
)), row.names = c(NA, -11L), class = "data.frame")