Search code examples
rdataframenested-loops

Loop through rows in list of dataframes and extract data. (Nested "apply" functions)


I am new to R and trying to do things the "R" way, which means no for loops. I would like to loop through a list of dataframes, loop through each row in the dataframe, and extract data based on criteria and store in a master dataframe.

Some issues I am having are with accessing the "global" dataframe. I am unsure the best approach (global variable, pass by reference).

I have created an abstract example to try to show what needs to be done:

rm(list=ls())## CLEAR WORKSPACE
assign("last.warning", NULL, envir = baseenv())## CLEAR WARNINGS

# Generate a descriptive name with name and size
generateDescriptiveName <- function(animal.row, animalList.vector){

   name <- animal.row["animal"]
   size <- animal.row["size"]

   # if in list of interest prepare name for master dataframe
   if (any(grepl(name, animalList.vector))){
     return (paste0(name, "Sz-", size))
   }

}

# Animals of interest
animalList.vector <- c("parrot", "cheetah", "elephant", "deer", "lizard")

jungleAnimals <- c("ants", "parrot", "cheetah")
jungleSizes <- c(0.1, 1, 50)
jungle.df <- data.frame(jungleAnimals, jungleSizes)


fieldAnimals <- c("elephant", "lion", "hyena")
fieldSizes <- c(1000, 100, 80)
field.df <- data.frame(fieldAnimals, fieldSizes)

forestAnimals <- c("squirrel", "deer", "lizard")
forestSizes <- c(1, 40, 0.2)
forest.df <- data.frame(forestAnimals, forestSizes)

ecosystems.list <- list(jungle.df, field.df, forest.df)

# Final master list
descriptiveAnimal.df <- data.frame(name = character(), descriptive.name = character())

# apply to all dataframes in list
lapply(ecosystems.list, function(ecosystem.df){
  names(ecosystem.df) <- c("animal", "size")
  # apply to each row in dataframe
  output <- apply(ecosystem.df, 1, function(row){generateDescriptiveName(row, animalList.vector)})
  if(!is.null(output)){
    # Add generated names to unique master list (no duplicates)
  }
})

The end result would be:

         name        descriptive.name
1    "parrot"         "parrot Sz-0.1"
2   "cheetah"         "cheetah Sz-50"
3  "elephant"      "elephant Sz-1000"
4      "deer"            "deer Sz-40"
5    "lizard"         "lizard Sz-0.2"

Solution

  • I did not use your function generateDescriptiveName() because I think it is a bit too laborious. I also do not see a reason to use apply() within lapply(). Here is my attempt to generate the desired output. It is not perfect but I hope it helps.

    df_list <- lapply(ecosystems.list, function(ecosystem.df){
      names(ecosystem.df) <- c("animal", "size")
      temp <- ecosystem.df[ecosystem.df$animal %in% animalList.vector, ]
      if(nrow(temp) > 0){
      data.frame(name = temp$animal, descriptive.name = paste0(temp$animal, " Sz-", temp$size))
      }
    })
    
    do.call("rbind",df_list)