I would like to test my data.frame using the validate
package. I have some Date columns in the data frame. How do I check that the column is in the right format?
I tried so far
library(validate)
dat <- data.frame(
when=seq(as.Date(Sys.time()), length.out=5, by="1 day"),
value=runif(5)
)
rules <- validator(
"when" %in% names(.),
inherits(when, "POSIXct"), # does not work: "Invalid syntax detected"
"value" %in% names(.),
is.numeric(value)
)
confront(dat, rules)
but this results in a warning that the second rule is ignored due to invalid syntax.
EDIT: I tried
is_date <- function(...) is(..., "Date")
rules <- validator(
inherits(when, "POSIXct"),
is(when, "Date"),
is_date(when)
)
and all attempts lead to errors "invalid syntax".
Maybe validator()
is not intended to do column type checks?
This should give you what you want:
library(lubridate)
rules1 <- validator(
is.numeric(value),
!is.na(when), # Check value validity
is.Date(when) # Check column type
)
summary(confront(dat, rules1))
name items passes fails nNA error warning expression
1 V1 1 1 0 0 FALSE FALSE is.numeric(value)
2 V2 5 5 0 0 FALSE FALSE !is.na(when)
3 V3 1 1 0 0 FALSE FALSE is.Date(when)
If you don't want to use lubridate
, try
is.Date.column <- function(x) inherits(x, "Date")
rules2 <- validator(
is.numeric(value),
!is.na(when),
is.Date.column(when)
)
summary(confront(dat, rules2))