I am trying to calculate age from two date columns. First, I convert to NA any invalid date of births (dob). Next, I try to calculate age using lubridate (solution from: https://stackoverflow.com/a/41730322/8772229) but get an error message. Any advice on what is going wrong?
Data:
df <- data.frame(dob=as.Date(c("2020-09-26", "2017-12-01", NA)), today=as.Date(c("2020-09-25", "2020-09-25", "2020-09-25")))
df
dob today
1 2020-09-26 2020-09-25
2 2017-12-01 2020-09-25
3 <NA> 2020-09-25
Code:
library(lubridate)
df %>%
mutate(
# convert non-plausible dates to NA
dob= case_when((dob>today)~as.Date(NA_character_), TRUE~as.Date(dob)),
# calculate age
age=year(as.period(interval(start = dob, end = today))))
Message:
Error in FUN(X[[i]], ...) : subscript out of bounds
It gives me a different error because of trying to extract year
value from a NA
period. You can use time_length
function from lubridate
to get difference in years.
library(dplyr)
library(lubridate)
df %>%
mutate(dob= replace(dob, dob > today, NA),
age= time_length(today-dob, 'years'))
# dob today age
#1 <NA> 2020-09-25 NA
#2 2017-12-01 2020-09-25 2.817248
#3 <NA> 2020-09-25 NA