Search code examples
rfunctionsapply

Run sapply function with 2 inputs (variable and dataframe)


I am running a function to perform weighted two-sample t-tests on multiple subsets of a dataframe. A reproducible version of my code (using the mtcars dataset) is the following:

library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$carb)
multiple_wt_ttest <- function(df) {ttest = wtd.t.test(x=subset(df, am == 0)$disp,y=subset(df, am == 1)$disp,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
 out <<- ttest[2]}

store <- do.call(rbind, sapply(df_list, multiple_wt_ttest))

Which yields a dataframe displaying the desired t-test for each subset of the mtcars based on the variable carb. Now, I want to repeat this, not just for comparing the variable disp but for multiple variables in the dataframe (in mtcars, for example, drat,cyl,gear,etc. The formula would therefore be something like the following:

library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl","drat","disp")
multiple_wt_ttest <- function(df,var) {ttest = wtd.t.test(x=subset(df, am == 0)$var,y=subset(df, am == 1)$var,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
 out <<- ttest[2]}

store <- do.call(rbind, sapply(df_list,var=var_list, multiple_wt_ttest))

But this does not work and yields the error: Error in var(x) : 'x' is NULL

I think this has to do with the fact that the original sapply is providing a dataframe, whereas the new var_list is a vector/list of variables. How, then can I combine 2 different inputs in my sapply function to repeat this process of t-tests on each subset of the data, for multiple variables (instead of just one) and compile the results next to each other in a table?


Solution

  • Here is a solution.

    First of all, I have corrected the function so that it can cope with input data.frames with only one am value, such as data with only one row.
    Then, call the code that runs for one variable in a lapply loop on the variables list.

    library(weights)
    #> Loading required package: Hmisc
    #> 
    #> Attaching package: 'Hmisc'
    #> The following objects are masked from 'package:base':
    #> 
    #>     format.pval, units
    
    multiple_wt_ttest <- function(df, target_var) {
      i0 <- df$am == 0
      i1 <- df$am == 1
      if(any(i0) && any(i1)) {
        ttest <- wtd.t.test(
          x = df[[target_var]][i0],
          y = df[[target_var]][i1],
          weight = df$wt[i0],
          weighty = df$wt[i1],
          samedata = FALSE
        )
        ttest[[2]]
      } else NULL
    }
    
    df_list <- split(mtcars, mtcars$carb)
    var_list <- list("cyl","drat","disp")
    
    results_list <- lapply(var_list, \(v) {
      store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, target_var = v))
      store <- as.data.frame(store)
      store$variable <- v
      store[c(4, 1:3)]
    })
    
    do.call(rbind, results_list)
    #>    variable   t.value       df     p.value
    #> 1       cyl  2.327192 2.000000 0.145420369
    #> 2       cyl  3.351162 5.000000 0.020303028
    #> 4       cyl  1.068365 3.070500 0.362061152
    #> 11     drat -3.335558 2.345842 0.063563101
    #> 21     drat -3.633611 6.293180 0.010048620
    #> 41     drat -3.455307 7.778648 0.009008048
    #> 12     disp  3.069880 2.183101 0.082230383
    #> 22     disp  3.897422 5.560369 0.009295961
    #> 42     disp  1.697305 4.282223 0.160142699
    

    Created on 2023-05-26 with reprex v2.0.2