So I am trying this code, which I have used in the past with other data wrangling tasks with no errors:
## Create an age_at_enrollment variable, based on the start_date per individual (i.e. I want to know an individual's age, when they began their healthcare job).
complete_dataset_1 = complete_dataset %>% mutate(age_at_enrollment = (as.Date(start_date)-as.Date(birth_date))/365.25)
However, I keep receiving this error message: "Error in charToDate(x) : character string is not in a standard unambiguous format"
I believe this error is happening because in the administrative dataset that I am using, the start_date and birth_date variables are formatted in an odd way:
start_date birth_date
2/5/07 0:00 2/28/1992 0:00
I could not find an answer as to why the data is formatted that, so any thoughts on how to fix this issue without altering the original administrative dataset?
The ambiguity in your call to as.Date
is whether the day or month comes first. To resolve this, you may use the format
parameter of as.Date
:
complete_dataset_1 = complete_dataset
%>% mutate(age_at_enrollment = (
as.Date(start_date, format="%m/%d/%Y") -
as.Date(birth_date, format="%m/%d/%Y")) / 365.25)
A more precise way to calculate the diff in years, handling the leap year edge case, would be to use the lubridate
package:
library(lubridate)
complete_dataset_1 = complete_dataset
%>% mutate(age_at_enrollment = time_length(difftime(
as.Date(start_date, format="%m/%d/%Y"),
as.Date(birth_date, format="%m/%d/%Y")), "years")