Search code examples
rautomation

How can I automate looking up, filtering, and saving values in a new list/dataframe


I have a larger dataset from which I computed different values. The interesting values are stored in dataframes within lists within lists. Now I want to automate looking up the interesting values and storing them in a handy way to perform further tasks (e.g. print out which units of analysis are of interest).

I am at a loss how to automate this tedious task, because placeholders don't work anymore - or none that I know of.

EXAMPLE:

library(tidyverse)
library(flextable)
set.seed(1234)
largelist <- list(
  Tab_USA_2001 = list(df = as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=5)), case = c(1)),
  Tab_CAN_1999 = list(df = as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=5)), case = c(2))
)

I want to perform these two steps (but not only with two examples but many more):

Important_data <- list(TAB_USA_2001 = filter(largelist$Tab_USA_2001$df, V1 > 10),
                       TAB_CAN_1999 = filter(largelist$Tab_CAN_1999$df, V1 > 10)
                       )
flextable(Important_data$TAB_USA_2001)
flextable(Important_data$TAB_CAN_1999)

The data is always stored following this convention of

  • Tab (for Table of Results from previous steps)
  • XXX (Three letter code for the analyzed country)
  • YYYY (Four numbers for the year of analysis)

Not all countries are analyzed for every year and there are only certain years in the analysis (e.g., 1991, 1999, 2001, 2003) which are not directly related.

Furthermore, I don't yet understand how I can access data that is buried deep in the list without typing the ...$...$... out everytime - shouldn't there be a way to make this easier. For example saying largelist$[PLACEHOLDER]$df

How could I create a loop/automation cycle that brings the interesting values from the Tables of Results together?


Solution

  • If largelist already has all your tables, here's a way to reshape your list into a data frame with each table name, its related data and case value, separated out for easier filtering and analysis.

    enframe(largelist) |>
      unnest_wider(value) |>
      unnest(df) |>
      separate(name, c("Tab", "Country", "Year"), remove = FALSE) |>
      filter(V1 > 10)
    
    # A tibble: 7 × 7
      name         Tab   Country Year     V1    V2  case
      <chr>        <chr> <chr>   <chr> <dbl> <dbl> <dbl>
    1 Tab_USA_2001 Tab   USA     2001   12.8  1.18     1
    2 Tab_USA_2001 Tab   USA     2001   12.6  5.42     1
    3 Tab_USA_2001 Tab   USA     2001   12.8 13.7      1
    4 Tab_USA_2001 Tab   USA     2001   17.4 10.8      1
    5 Tab_CAN_1999 Tab   CAN     1999   14.2 16.9      2
    6 Tab_CAN_1999 Tab   CAN     1999   11.4  6.44     2
    7 Tab_CAN_1999 Tab   CAN     1999   18.5  4.55     2