Search code examples
ruser-defined-functionsweightedexpss

Passing arguments dynamically in Expss tables with user-defined functions


I have a (new) question related to expss tables. I wrote a very simple UDF (that relies on few expss functions), as follows:

library(expss)
z_indices <- function(x, m_global, std_global, weight=NULL){
  if(is.null(weight)) weight = rep(1, length(x))
  z <- (w_mean(x, weight)-m_global)/std_global
  indices <- 100+(z*100)
  return(indices)
}

Reproducible example, based on infert dataset (plus a vector of arbitrary weights):

data(infert)
infert$w <- as.vector(x=rep(2, times=nrow(infert)), mode='numeric')
infert %>%
  tab_cells(age, parity) %>%
  tab_cols(total(), education, case %nest% list(total(), education)) %>%
  tab_weight(w) %>%
  tab_stat_valid_n(label="N") %>%
  tab_stat_mean(label="Mean") %>%
  tab_stat_fun(label="Z", function(x, m_global, std_global, weight=NULL){
    z_indices(x, m_global=w_mean(infert$age, infert$w),std_global=w_sd(infert$age, infert$w))
    }) %>%
  tab_pivot(stat_position="inside_columns")

The table is computed and the output for the first line is (almost) as expected. Then things go messy for the second line, since both arguments of z_indices explicitely refer to infert$age, where infert$parity is expected. My question: is there a way to dynamically pass the variables of tab_cells as function argument within tab_stat_fun to match the variable being processed? I guess this happens inside function declaration but have not clue how to proceed...

Thanks!

EDIT April 28th 2020: Answer from @Gregory Demin works great in the scope of infert dataset, although for better scalability to larger dataframes I wrote the following loop:

var_df <- data.frame("age"=infert$age, "parity"=infert$parity)
tabZ=infert
for(each in names(var_df)){
  tabZ = tabZ %>%
    tab_cells(var_df[each]) %>%
    tab_cols(total(), education) %>%
    tab_weight(w) %>%
    tab_stat_valid_n(label="N") %>%
    tab_stat_mean(label="Mean") %>%
    tab_stat_fun(label="Z", function(x, m_global, std_global, weight=NULL){
      z_indices(x, m_global=w_mean(var_df[each], infert$w),std_global=w_sd(var_df[each], infert$w))
    })
} 
tabZ = tabZ %>% tab_pivot()

Hope this inspires other expss users in the future!


Solution

  • There is no universal solution for this case. Function in the tab_stat_fun is always calculated inside cell so you can't get global values in it. However, in your case we can calculate z-index before summarizing. Not so flexible solution but it works:

    # function for weighted z-score
    w_z_index = function(x, weight = NULL){
        if(is.null(weight)) weight = rep(1, length(x))
        z <- (x - w_mean(x, weight))/w_sd(x, weight)
        indices <- 100+(z*100)
        return(indices)
    }
    
    data(infert)
    infert$w <- rep(2, times=nrow(infert))
    infert %>%
        tab_cells(age, parity) %>%
        tab_cols(total(), education, case %nest% list(total(), education)) %>%
        tab_weight(w) %>%
        tab_stat_valid_n(label="N") %>%
        tab_stat_mean(label="Mean") %>%
        # here we get z-index instead of original variables
        tab_cells(age = w_z_index(age, w), parity = w_z_index(parity, w)) %>%
        tab_stat_mean(label="Z") %>%
        tab_pivot(stat_position="inside_columns")
    

    UPDATE. A little more scalable approach:

    w_z_index = function(x, weight = NULL){
        if(is.null(weight)) weight = rep(1, length(x))
        z <- (x - w_mean(x, weight))/w_sd(x, weight)
        indices <- 100+(z*100)
        return(indices)
    }
    
    w_z_index_df = function(df, weight = NULL){
        df[] = lapply(df, w_z_index, weight = weight)
        df
    }
    
    data(infert)
    infert$w <- rep(2, times=nrow(infert))
    infert %>%
        tab_cells(age, parity) %>%
        tab_cols(total(), education, case %nest% list(total(), education)) %>%
        tab_weight(w) %>%
        tab_stat_valid_n(label="N") %>%
        tab_stat_mean(label="Mean") %>%
        # here we get z-index instead of original variables
        # we process a lot of variables at once
        tab_cells(w_z_index_df(data.frame(age, parity))) %>%
        tab_stat_mean(label="Z") %>%
        tab_pivot(stat_position="inside_columns")