Search code examples
rlapplysapply

Extending an sapply to apply list of variables and saving output as list of data frames in R


I have a data set similar to the example below, complex sample data. Thanks to SO user IRTFM, I was able to adapt the code and save results (i'm only interested in the total proportions, not the confidence intervals) as a reshaped object for further processing. What I would like to do is extend this sapply to generate results for 20 other variables. I would like to save the results as data frames in a list, ideally, since I think this is the most efficient way. My struggle is how to extend the sapply so that I can process multiple variables at once. I thought about a for loop over a list that holds the names of the variables and started to make this list, var_list below, but this seems not the way forward. I'd rather take advantage of the apply family since I would like the results to be stored in a list.

library(survey) # using the `dclus1` object that is standard in the examples.
library(reshape)
library(tidyverse)

data(api)

stype_t <- sapply( levels(dclus1$variables$stype),
        function(x){ 
           form <- as.formula( substitute( ~I(stype %in% x), list(x=x)))
           z <- svyciprop(form, dclus1, method="me", df=degf(dclus1))
           c( z, c(attr(z,"ci")) )}  ) %>% 
  as.data.frame() %>% slice(1) %>% reshape::melt() %>% dplyr::mutate(value = round(value, digits = 4)*100)

Lets say you then wanted to repeat the above using the variable awards. You could copy the lines and do it that way but it would be better to be more efficient. So I started by making a list of the names of the two variables in this example data but I am stumped as to how to apply this list to the code above and retain the results in a list of dataframes. I tried wrapping the sapply with an lapply but this did not work because I'm betting that was wrong. Any advice or thoughts would be appreciated.

var_list <- list("stype", "awards")

Solution

  • Instead of $ to reference named elements, consider [[ extractor to reference names by string. Also, extend substitute for dynamic variable:

    # DEFINED METHOD
    df_build <- function(var) {
     sapply(levels(dclus1$variables[[var]]), function(x) { 
         form <- as.formula(substitute(~I(var %in% x), 
                                       list(var=as.name(var), x=x))) 
         z <- svyciprop(form, dclus1, method="me", df=degf(dclus1)) 
         c(z, c(attr(z,"ci")))
     }) %>% 
      as.data.frame() %>% 
      slice(1) %>% 
      reshape::melt() %>% 
      dplyr::mutate(value = round(value, digits = 4)*100) 
    }
    
    # ITERATE THROUGH CHARACTER VECTOR AND CALL METHOD
    var_list <- list("stype", "awards")
    df_list <- lapply(var_list, df_build)