Search code examples
rpurrr

How to aggregate over multiple data.frames using something like rbind R


I have a bunch of dataframes that contain individual-level person data for complex survey weight adjusted survey data at a state level. Say one for each state:

df_1, df_2, ..., df_50

I have a function, calc_wt(data,age_min,age_max) that takes an individual dataframe (like df_1), a minimum age, a maximum age and returns unweighted count and weighted means/SEs of the individual level data in that data.frame, when the dataset is subset to those within the minimum age and maximum age range.

What I want for each minimum and maximum age range is a results dataframe, where each row is equal to the aggregated data returned from calc_wt() of df_1, df_2, ..., df_50.

so I want something like:

rbind(calc_wt(df_1,age_min = 18, age_max = 84),
      calc_wt(df_2,age_min = 18, age_max = 84),
      ....,
      calc_wt(df_50,age_min = 18, age_max = 84))

But is there a way to do it without specifying each input dataframe exactly? Maybe something like purrr?


Solution

  • In base R: mget() + lapply() + do.call("rbind", ...)

    df_list <- mget(ls(pattern="^df_[0-9]+$"))
    cw_list <- lapply(df_list, calc_wt, age_min = 18, age_max = 84)
    result <- do.call("rbind", cw_list)
    

    You can do this in tidyverse too once you've got df_list() with map() + list_rbind() (or map_dfr(), which I prefer but which tidyverse reports as being superseded ...)

    It would be more robust to go upstream and get your df_* objects as a list in the first place, rather than cluttering your workspace with them and then using mget() to retrieve them (e.g. use map() or lapply() with read_csv() (or whatever) and a character vector of file names ...)