I am running a function to perform weighted two-sample t-tests on multiple subsets of a dataframe. A reproducible version of my code (using the mtcars dataset) is the following:
library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$carb)
multiple_wt_ttest <- function(df) {ttest = wtd.t.test(x=subset(df, am == 0)$disp,y=subset(df, am == 1)$disp,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
out <<- ttest[2]}
store <- do.call(rbind, sapply(df_list, multiple_wt_ttest))
Which yields a dataframe displaying the desired t-test for each subset of the mtcars based on the variable carb
. Now, I want to repeat this, not just for comparing the variable disp
but for multiple variables in the dataframe (in mtcars, for example, drat
,cyl
,gear
,etc. The formula would therefore be something like the following:
library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl","drat","disp")
multiple_wt_ttest <- function(df,var) {ttest = wtd.t.test(x=subset(df, am == 0)$var,y=subset(df, am == 1)$var,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
out <<- ttest[2]}
store <- do.call(rbind, sapply(df_list,var=var_list, multiple_wt_ttest))
But this does not work and yields the error:
Error in var(x) : 'x' is NULL
I think this has to do with the fact that the original sapply
is providing a dataframe, whereas the new var_list
is a vector/list of variables. How, then can I combine 2 different inputs in my sapply
function to repeat this process of t-tests on each subset of the data, for multiple variables (instead of just one) and compile the results next to each other in a table?
Here is a solution.
First of all, I have corrected the function so that it can cope with input data.frames with only one am
value, such as data with only one row.
Then, call the code that runs for one variable in a lapply
loop on the variables list.
library(weights)
#> Loading required package: Hmisc
#>
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#>
#> format.pval, units
multiple_wt_ttest <- function(df, target_var) {
i0 <- df$am == 0
i1 <- df$am == 1
if(any(i0) && any(i1)) {
ttest <- wtd.t.test(
x = df[[target_var]][i0],
y = df[[target_var]][i1],
weight = df$wt[i0],
weighty = df$wt[i1],
samedata = FALSE
)
ttest[[2]]
} else NULL
}
df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl","drat","disp")
results_list <- lapply(var_list, \(v) {
store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, target_var = v))
store <- as.data.frame(store)
store$variable <- v
store[c(4, 1:3)]
})
do.call(rbind, results_list)
#> variable t.value df p.value
#> 1 cyl 2.327192 2.000000 0.145420369
#> 2 cyl 3.351162 5.000000 0.020303028
#> 4 cyl 1.068365 3.070500 0.362061152
#> 11 drat -3.335558 2.345842 0.063563101
#> 21 drat -3.633611 6.293180 0.010048620
#> 41 drat -3.455307 7.778648 0.009008048
#> 12 disp 3.069880 2.183101 0.082230383
#> 22 disp 3.897422 5.560369 0.009295961
#> 42 disp 1.697305 4.282223 0.160142699
Created on 2023-05-26 with reprex v2.0.2