I would like to create a subclass of data.frame
that carries around some information about the state of particular columns. I thought the best way to do this would be with an attribute, special_col
. A simple constructor seems to work fine:
# Light class that keeps an attribute about a particular special column
new_my_class <- function(x, special_col) {
stopifnot(inherits(x, "data.frame"))
attr(x, "special_col") <- special_col
class(x) <- c("my_class", class(x))
x
}
my_mtcars <- new_my_class(mtcars, "mpg")
class(my_mtcars) # subclass of data.frame
#> [1] "my_class" "data.frame"
attributes(my_mtcars)$special_col # special_col attribute is still there
#> $special_col
#> [1] "mpg"
However, I run into the problem that I need to write methods for various generics to update this attribute if the column name happens to change. As shown below, using the data.frame
method will leave the attribute untouched.
library(dplyr)
# Using select to rename a column does not update the attribute
select(my_mtcars, x = mpg) %>%
attr("special_col")
#> [1] "mpg"
Here is my current, naive attempt at a method for my_class
. I go about capturing the dots and then parsing them to figure out which columns were renamed, and changing the attribute if they were in fact renamed.
# I attempt to capture the dots supplied to select and replace the attribute
select.my_class <- function(.data, ...) {
exprs <- enquos(...)
sel <- NextMethod("select", .data)
replace_renamed_cols(sel, "special_col", exprs)
}
# This is slightly more complex than needed here in case there is more than one special_col
replace_renamed_cols <- function(x, which, exprs) {
att <- attr(x, which)
renamed <- nzchar(names(exprs)) # Bool: was column renamed?
old_names <- purrr::map_chr(exprs, rlang::as_name)[renamed]
new_names <- names(exprs)[renamed]
att_rn_idx <- match(att, old_names) # Which attribute columns were renamed?
att[att_rn_idx] <- new_names[att_rn_idx]
attr(x, which) <- att
x
}
# This solves the immmediate problem:
select(my_mtcars, x = mpg) %>%
attr("special_col")
#> [1] "x"
Unfortunately, I think this is particularly brittle and fails in other circumstances, as shown below.
# However, this fails with other expressions:
select(my_mtcars, -cyl)
#> Error: Can't convert a call to a string
select(my_mtcars, starts_with("c"))
#> Error: Can't convert a call to a string
My feeling is that it would be preferable to get the changes in columns after tidyselect
has done its work, rather than attempting to generate the same changes in the attributes from capturing dots as I have done. The key question is: how can I use tidyselect
tools to understand what changes are going to happen to a dataframe when select variables?. Ideally I could return something that keeps track of which columns are renamed to which others, which are dropped etc. and use that to keep the attribute special_col
up to date.
I think the way to do it is to encode you attribute updating in the [
and names<-
methods, then the default select method should use these generics. This should be the case in the next major version of dplyr.
See https://github.com/r-lib/tidyselect/blob/8d1c76dae81eb55032bcbd83d2d19625db887025/R/eval-select.R#L152-L156 for a preview of what select.default will look like. We might even remove tbl-df and data.frame methods from dplyr. The last line is of interest, it invokes [
and names<-
methods.