I have a number of files in a directory, and I want to distribute them between two newly created subdirectories based on whether their content satisfies a certain condition. Moreover, I need to perform this operation on multiple directories, and each time, the condition might be different.
The way I thought I would go about this was to create a higher-order function that reads in the files, extracts the property of interest from each, and then applies the condition (supplied as an argument in the form of a function) element-wise to the resulting vector of values:
filter.by.property <- function(where, funct) {
setwd(where)
paths <- list.files(".", pattern = ".file")
Robjects <- Map(read.file, paths)
values <- sapply(Robjects, function(x) property_of_interest(x))
passes <- Filter(funct, paths)
dir.create("dir1")
dir.create("dir2")
file.copy(passes, paste("./dir1/", 1:length(passes), ".file", sep = ""))
file.copy(paths[!paths %in% passes], paste("./dir2/", (length(passes) + 1):length(paths), ".file", sep = ""))
}
The problem is that I would like the condition-specifying function (supplied in the funct
argument) to have access to the values
vector. For example:
ten.biggest <- function(path) {
Robject <- read.file(path)
property_of_interest(x) %in% tail(sort(values), 10)
}
filter.by.property(<place>, ten.biggest)
Since R is lexically scoped, it will look for values
in the environment where ten.biggest
is defined (the global environment) rather than in the environment where it is called (i.e., inside filter.by.property
). I could solve this by using global assignment for values
inside filter.by.property
, but I'd prefer not to do that unless absolutely necessary. I could also supply values
as another argument to ten.biggest
, but then the function would no longer be unary, and I'm not sure how (or even whether) it could still be used inside Filter
.
One solution I attempted was to rewrite ten.biggest
as follows
ten.biggest <- function(path) {
Robject <- read.file(path)
property_of_interest(x) %in% tail(sort(eval.parent(parse(text = values))), 10)
}
but that didn't work.
Is there any way to I can emulate dynamic scoping in R to make this work? From reading related StackExchange questions, it seems that perhaps the body
function could help, but I don't know how to apply it to my problem.
I found a solution that doesn't rely on switching environments. I allowed ten.biggest
to take two arguments, and defined a helper function, outer_fxn
, right inside filter.by.property
to convert a binary function into a unary function by setting one of its arguments. The binary ten.biggest
function can then be passed to filter.by.property
via the latter's funct
argument, the vals
argument of ten.biggest
is set to values
by outer_fxn
, and the resulting unary function can then be used inside Filter
:
ten.biggest <- function(path, vals) {
Robject <- read.file(path)
property_of_interest(x) %in% tail(sort(vals), 10)
}
filter.by.property <- function(where, funct) {
setwd(where)
paths <- list.files(".", pattern = ".file")
Robjects <- Map(read.file, paths)
values <- sapply(Robjects, function(x) property_of_interest(x))
outer_fxn <- function(x) { funct(x, values) }
passes <- Filter(outer_fxn, paths)
dir.create("dir1")
dir.create("dir2")
file.copy(passes, paste("./dir1/", 1:length(passes), ".file", sep = ""))
file.copy(paths[!paths %in% passes], paste("./dir2/", (length(passes) + 1):length(paths), ".file", sep = ""))
}