I'm currently new to R and I'm trying to make a script for cleaning messy dates. Basically, I have a linelist of deaths and it has a column called "Date of Death". This column contains dates that are not uniformly formatted so I'd like to parse and format the dates into a uniform date format however, I'm getting the following error (cleanlist object not found) however, when I manually try to run the cleanlist code, it runs without any issues. I just get the issue below if I try to run the whole script. Also, I want to know if how would I properly use the mutate function to update the date_of_death column in the clean_list table/object.
# Loading packages --------------------------------------------------------
# Checks if package is installed, installs if necessary, and loads package for current session
pacman::p_load(
lubridate, # general package for handling and converting dates
parsedate, # has function to "guess" messy dates
here, # file management
rio, # data import/export
janitor, # data cleaning
epikit, # for creating age categories
tidyverse, # data management and visualization
magrittr,
dplyr,
reprex, # minimal repr example
datapasta # sample data
)
datecolumn <- "date_of_death"
# Data Import -------------------------------------------------------------
linelist <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
`Date of Death` = c("45236","45212","45152",
"JANUARY 19, 2023","June 25, 2023","45200","45164",
"5/16/2023","45277","44930"))
# Data Cleaning -----------------------------------------------------------
cleanlist <- linelist %>% # the raw dataset
clean_names() %>% # automatically clean column names
# Format Dates ------------------------------------------------------------
# parse the date values
mutate(parsedate::parse_date(cleanlist[[datecolumn]]))
#> Error in `mutate()`:
#> ℹ In argument: `parsedate::parse_date(cleanlist[[datecolumn]])`.
#> Caused by error:
#> ! object 'cleanlist' not found
Created on 2024-08-12 with reprex v2.1.1
A base R approach. grepl
identifies non-numeric dates by looking for ,
or /
.
data.frame(df,
cleaned_date_of_death = as.Date(sapply(df$'Date of Death', \(x)
ifelse(grepl(",|/", x), as.Date(x, tryFormats = c("%m/%d/%Y", "%B %d, %Y")),
as.Date(as.numeric(x), origin = "1899-12-30")))),
row.names = NULL)
Date.of.Death cleaned_date_of_death
1 45236 2023-11-06
2 45212 2023-10-13
3 45152 2023-08-14
4 JANUARY 19, 2023 2023-01-19
5 June 25, 2023 2023-06-25
6 45200 2023-10-01
7 45164 2023-08-26
8 5/16/2023 2023-05-16
9 45277 2023-12-17
10 44930 2023-01-04