Search code examples
rrdscustom-function

Loading Multiple RDS Files in R as Multiple Objects in a Custom Function


I'm trying to write a custom function to load multiple RDS files and assign them to separate objects within my environment. The code for the function is below:

read_multi_rds <- function(filepath, regrex) {

  ## grab all files in filepath with regrex provided
  files <- list.files(path = filepath, pattern = regrex) 
  var_names <- character(0)

  for(i in 1:length(files)){
    name <- substr(files[i], 1, (nchar(files[i])-4)) ## -4 to remove the .rds from the var name
    var_names[i] <- name
  }

  for(i in 1:length(files)){
    file <- readRDS(paste0(filepath, files[i]))
    assign(var_names[i], file)
  }
}

When I test this function by running each bit of the function separately:

filepath <- "I:/Data Sets/"
regrex <- "^cleaned"

files <- list.files(path = filepath, pattern = regrex) 
var_names <- character(0)

...followed by...

for(i in 1:length(files)){
    name <- substr(files[i], 1, (nchar(files[i])-4)) ## -4 to remove the .rds from the var name
    var_names[i] <- name
  }

...and finally...

for(i in 1:length(files)){
    file <- readRDS(paste0(filepath, files[i]))
    assign(var_names[i], file)
  }

...the objects are loaded into the environment.

But when I try to load the objects using the function:

read_multi_rds(filepath = "I:/Data Sets/", regrex = "^cleaned")

Nothing loads. I've added the line:

print('done')

at the end of the function to make sure it's running in its entirety, and it seems to be. I'm not getting any error messages or warnings, either.

Is there something I need to add into the function to properly load these items into my environment? Or is this just not possible to do as a function in R? I'm happy just using the code as is within my scripts, but being able to use it as a function would be much neater if I could pull it off.


Solution

  • assign, when used in a function, assigns in the environment of the function. You have to tell assign to assign in the global environment, as the following code illustrates:

    data(mtcars)
    tmp <- tempfile(fileext = ".csv")
    write.csv(mtcars, tmp)
    
    read_wrong <- function(file_name = tmp) {
       f <- read.csv(file_name)
       assign("my_data", f)
       ls() # shows that my_data is in the current environment
    }
    
    read_correct <- function(file_name = tmp) {
       f <- read.csv(file_name)
       assign("my_data", f, envir = .GlobalEnv)
       ls() # shows that my_data is not in the current environment
    }
    
    
    read_wrong()
    # [1] "f"         "file_name" "my_data"
    ls() # no my_data
    # [1] "mtcars"       "read_correct" "read_wrong"   "tmp"
    read_correct()
    # [1] "f"         "file_name"
    ls()
    # [1] "mtcars"       "my_data"      "read_correct" "read_wrong"   "tmp" 
    

    Having said that I would not use assign in the first place but instead return a list of data frames from the function.

    read_better <- function(file_name = tmp) {
       parsed_name <- basename(tmp) # do some parsing here to get a proper object name
       f <- read.csv(file_name)
       setNames(list(f), parsed_name)
    }
    all_data <- read_better()