Search code examples
rvalidation

Validate Field Type Date with package `validate`


I would like to test my data.frame using the validate package. I have some Date columns in the data frame. How do I check that the column is in the right format?

I tried so far

library(validate)
dat <- data.frame(
    when=seq(as.Date(Sys.time()), length.out=5, by="1 day"),
    value=runif(5)  
)
rules <- validator(
    "when" %in% names(.),
    inherits(when, "POSIXct"), # does not work: "Invalid syntax detected"
    "value" %in% names(.),
    is.numeric(value)
)
confront(dat, rules)

but this results in a warning that the second rule is ignored due to invalid syntax.

EDIT: I tried

is_date <- function(...) is(..., "Date")
rules <- validator(
    inherits(when, "POSIXct"), 
    is(when, "Date"),
    is_date(when)
)

and all attempts lead to errors "invalid syntax".

Maybe validator() is not intended to do column type checks?


Solution

  • This should give you what you want:

    library(lubridate)
    
    rules1 <- validator(
      is.numeric(value),
      !is.na(when),       # Check value validity
      is.Date(when)       # Check column type
    )
    
    summary(confront(dat, rules1))
      name items passes fails nNA error warning        expression
    1   V1     1      1     0   0 FALSE   FALSE is.numeric(value)
    2   V2     5      5     0   0 FALSE   FALSE      !is.na(when)
    3   V3     1      1     0   0 FALSE   FALSE     is.Date(when)
    

    If you don't want to use lubridate, try

    is.Date.column <- function(x) inherits(x, "Date")
    
    rules2 <- validator(
      is.numeric(value),
      !is.na(when),
      is.Date.column(when)
    )
    summary(confront(dat, rules2))