Search code examples
rhigher-order-functionsscoping

Emulate dynamic scoping in R to filter by arbitrary functions


I have a number of files in a directory, and I want to distribute them between two newly created subdirectories based on whether their content satisfies a certain condition. Moreover, I need to perform this operation on multiple directories, and each time, the condition might be different.

The way I thought I would go about this was to create a higher-order function that reads in the files, extracts the property of interest from each, and then applies the condition (supplied as an argument in the form of a function) element-wise to the resulting vector of values:

filter.by.property <- function(where, funct) {
  setwd(where)
  paths <- list.files(".", pattern = ".file")
  Robjects <- Map(read.file, paths)
  values <- sapply(Robjects, function(x) property_of_interest(x))
  passes <- Filter(funct, paths)
  dir.create("dir1")
  dir.create("dir2")
  file.copy(passes, paste("./dir1/", 1:length(passes), ".file", sep = ""))
  file.copy(paths[!paths %in% passes], paste("./dir2/", (length(passes) + 1):length(paths), ".file", sep = ""))
}

The problem is that I would like the condition-specifying function (supplied in the funct argument) to have access to the values vector. For example:

ten.biggest <- function(path) {
  Robject <- read.file(path)
  property_of_interest(x) %in% tail(sort(values), 10)
}

filter.by.property(<place>, ten.biggest)

Since R is lexically scoped, it will look for values in the environment where ten.biggest is defined (the global environment) rather than in the environment where it is called (i.e., inside filter.by.property). I could solve this by using global assignment for values inside filter.by.property, but I'd prefer not to do that unless absolutely necessary. I could also supply values as another argument to ten.biggest, but then the function would no longer be unary, and I'm not sure how (or even whether) it could still be used inside Filter.

One solution I attempted was to rewrite ten.biggest as follows

ten.biggest <- function(path) {
  Robject <- read.file(path)
  property_of_interest(x) %in% tail(sort(eval.parent(parse(text = values))), 10)
}

but that didn't work.

Is there any way to I can emulate dynamic scoping in R to make this work? From reading related StackExchange questions, it seems that perhaps the body function could help, but I don't know how to apply it to my problem.


Solution

  • I found a solution that doesn't rely on switching environments. I allowed ten.biggest to take two arguments, and defined a helper function, outer_fxn, right inside filter.by.property to convert a binary function into a unary function by setting one of its arguments. The binary ten.biggest function can then be passed to filter.by.property via the latter's funct argument, the vals argument of ten.biggest is set to values by outer_fxn, and the resulting unary function can then be used inside Filter:

    ten.biggest <- function(path, vals) {
      Robject <- read.file(path)
      property_of_interest(x) %in% tail(sort(vals), 10)
    }
    
    filter.by.property <- function(where, funct) {
      setwd(where)
      paths <- list.files(".", pattern = ".file")
      Robjects <- Map(read.file, paths)
      values <- sapply(Robjects, function(x) property_of_interest(x))
      outer_fxn <- function(x) { funct(x, values) }
      passes <- Filter(outer_fxn, paths)
      dir.create("dir1")
      dir.create("dir2")
      file.copy(passes, paste("./dir1/", 1:length(passes), ".file", sep = ""))
      file.copy(paths[!paths %in% passes], paste("./dir2/", (length(passes) + 1):length(paths), ".file", sep = ""))
    }