I am writing a function that does a t test on a dataframe, subsetting the data according to the arguments I defined. Here is a working example using the mtcars data:
testfunc <- function(dfrm, varq, factor, gear = FALSE,
am = FALSE, carb = FALSE){
# Subset the data according to the arguments:
subsetdfrm <- dfrm[which((dfrm[,"gear"] %in% gear) &
(dfrm[,"am"] %in% am) &
(dfrm[,"carb"] %in% carb)),]
# Grab the groups to be compared according to arguments:
factorbinary <- get(factor)
# The t-test:
t <- t.test(dfrm[which(dfrm[factor]==factorbinary[1]), varq],
dfrm[which(dfrm[factor]==factorbinary[2]), varq],
data = subsetdfrm)
print(t)
}
Here is the function in action, comparing cars with 3 gears vs cars with 4 gears, looking at automatic (am=0) cars with 2 to 4 carburetors:
testfunc(mtcars, "mpg", "gear", gear = c(3,4), am = 0, carb = c(2:4))
Note that I defined the defaults of the arguments as "FALSE". What I want is to find a default value for these arguments that automatically negates the subsetting, meaning that all values are included. My own best solution was to add if() clauses for each of the arguments at the beginning of the function as such:
if(carb == FALSE){gear <- unique(dfrm$gear)}
if(am == FALSE){am <- unique(dfrm$am)}
if(carb == FALSE){carb <- unique(dfrm$carb)}
This will become difficult to manage once the number of parameters increases. Is there a default value I can set my arguments to, that will negate the subset?
I imagine something that is equivalent to the opposite of a NULL object: A "not-NULL", or a wildcard object that is simply equal to everything. If not, could I modify my code to make use of the NULL object in the subsetting step?
Searches with keywords "all", "any" and "subset" typically link to pages referring to the functions all() and any() and didn't get me any further. I would appreciate any help, thanks.
With Frank's contribution in the comments, here is a working solution:
testfunc <- function(dfrm, varq, factor, gear = unique(dfrm$gear),
am = unique(dfrm$am), carb = unique(dfrm$carb)){
# Subset the data according to the arguments:
subsetdfrm <- dfrm[which((dfrm[,"gear"] %in% gear) &
(dfrm[,"am"] %in% am) &
(dfrm[,"carb"] %in% carb)),]
# Grab the groups to be compared according to arguments:
factorbinary <- get(factor)
# The t-test:
t <- t.test(dfrm[which(dfrm[factor]==factorbinary[1]), varq],
dfrm[which(dfrm[factor]==factorbinary[2]), varq],
data = subsetdfrm)
print(t)
}
In my original code, instead of dfrm
, I have a filepath that gets imported as dfrm
by read.csv()
. The function seems to have no problem handling the fact that "dfrm" being referred to in the arguments appears later in the course.