Search code examples
roopmethod-chainingr6

How to correctly write class methods in R6 and chain them


I need to create my own class object that takes a dataframe and has methods 'get_data' to choose dataframe, 'select' to select columns by their names and 'filter' to filter rows with certain values. Select and filter are a kind of similar to dplyr, but without using dplyr.

I would like they could be chained like this:

result <- df_object$get_data(df)$select(col1, col2, period)$filter(period)

What can I do so that 'filter' method would filter already selected values? Now it filters initial dataset. Also how to change methods so that select and filter wouldn't need data argument? Please give me some tips, I feel like I'm doing it a wrong way. Do I need to add some fields to class?

dataFrame <- R6Class("dataFrame", 
                          list(data = "data.frame"),
                      public = list(
  get_data = function(data) {data},                      
  select_func = function(data, columns) {data[columns]},
  filter_func = function(data, var) {data[var, ]}
  ))
# Create new object  
df_object <- dataFrame$new()
# Call methods
df_object$get_data(df)
df_object$select_func(df, c("month", "forecast"))
df_object$filter_func(df[df$month %in% c(1, 2), ])

Solution

  • If you want to chain member functions, you need those member functions to return self. This means that the R6 object has to modify the data it contains. Since the benefit of R6 is to reduce copies, I would probably keep a full copy of the data, and have select_func and filter_func update some row and column indices:

    library(R6)
    
    dataFrame <- R6Class("dataFrame", 
                          public = list(
      data = data.frame(),
      rows = 0,
      columns = 0,
      initialize = function(data) { 
        self$data <- data
        self$rows <- seq(nrow(data))
        self$columns <- seq_along(data)
      },
      get_data = function() {self$data[self$columns][self$rows,]},
      select_func = function(cols) {
        if(is.character(cols))  cols <- match(cols, names(self$data))
        self$columns <- cols
        self
      },
      filter_func = function(r) {
        if(is.logical(r)) r <- which(r)
        self$rows <- r
        self
      })
    )
    

    This allows us to chain the filter and select methods:

    dataFrame$new(iris)$filter_func(1:5)$select_func(1:2)$get_data()
    #>   Sepal.Length Sepal.Width
    #> 1          5.1         3.5
    #> 2          4.9         3.0
    #> 3          4.7         3.2
    #> 4          4.6         3.1
    #> 5          5.0         3.6
    

    and our select method can take names too:

    dataFrame$new(mtcars)$select_func(c("mpg", "wt"))$get_data()
    #>                      mpg    wt
    #> Mazda RX4           21.0 2.620
    #> Mazda RX4 Wag       21.0 2.875
    #> Datsun 710          22.8 2.320
    #> Hornet 4 Drive      21.4 3.215
    #> Hornet Sportabout   18.7 3.440
    #> Valiant             18.1 3.460
    #> Duster 360          14.3 3.570
    #> Merc 240D           24.4 3.190
    #> Merc 230            22.8 3.150
    #> Merc 280            19.2 3.440
    #> Merc 280C           17.8 3.440
    #> Merc 450SE          16.4 4.070
    #> Merc 450SL          17.3 3.730
    #> Merc 450SLC         15.2 3.780
    #> Cadillac Fleetwood  10.4 5.250
    #> Lincoln Continental 10.4 5.424
    #> Chrysler Imperial   14.7 5.345
    #> Fiat 128            32.4 2.200
    #> Honda Civic         30.4 1.615
    #> Toyota Corolla      33.9 1.835
    #> Toyota Corona       21.5 2.465
    #> Dodge Challenger    15.5 3.520
    #> AMC Javelin         15.2 3.435
    #> Camaro Z28          13.3 3.840
    #> Pontiac Firebird    19.2 3.845
    #> Fiat X1-9           27.3 1.935
    #> Porsche 914-2       26.0 2.140
    #> Lotus Europa        30.4 1.513
    #> Ford Pantera L      15.8 3.170
    #> Ferrari Dino        19.7 2.770
    #> Maserati Bora       15.0 3.570
    #> Volvo 142E          21.4 2.780
    

    For completeness, you need some type safety, and I would also add a reset method to remove all filtering. This effectively gives you a data frame where the filtering and selecting are non-destructive, which could actually be very useful.

    Created on 2022-05-01 by the reprex package (v2.0.1)