Search code examples
rloopschi-squaredt-test

What would be a more efficient way to create multiple Chisq/t-tests in R? (using Titanic data)


I have some very rudimentary code for generating a chisq test for some of the variables in the titanic dataset. I would like to have a way to differentiate categorical vs numeric/cont variables, so it would only do the chisq test on the categorical variables or t.tests if there were to be numeric variables.

I'm interested in being able to compare multiple levels between the Survived and Not-Survived groups like so:

Prop Survived Female vs Prop Not-Survived Female, Prop Survived Class 1 vs Prop Not-Survived Class 1, And so on..

The table subsets are directed for The Survived/Not-Survived Female comparison

library(Titanic)

titanic <- as.data.frame(Titanic)
names <- names(titanic)
names(cars)

for (var in names) { 
  tabla<-table(titanic$Survived, titanic[[var]])
  tabla<-addmargins(tabla)
  print(tab)
  res<-prop.test(x = c(tabla[1,2], tabla[2,2]), n = c(tabla[1,3], tabla[2,3]), correct = F)
  print(var)
  print(res)

}
}

Thank you

Solution

  • I would suggest you working with a function that detects the class of variable. I have sketched one function and you could modify if needed. It requires two arguments, the dataframe and the name of the variable.

    library(titanic)
    #Data
    data("Titanic")
    titanic <- as.data.frame(Titanic)
    #Function
    mytest <- function(data,x)
    {
      #Detect the type of var
      if(is.numeric(data[[x]]))
      {
        #Build variables x and y
        a <- data[[x]][data$Survived=='No']
        b <- data[[x]][data$Survived=='Yes']
        #Apply the test
        Res <- t.test(a,b)
        print(Res)
      } else
      {
        #Create table
        tab <- table(data$Survived,data[[x]])
        #Split in a list of vectors
        L1 <- lapply(1:ncol(tab), function(i) {tab[,i] })
        names(L1) <- dimnames(tab)[[2]]
        #Margins
        Margins <- rowSums(tab)
        #Test
        L2 <- lapply(L1, function(z) {prop.test(x = z, n = Margins, correct = F)})
        print(L2)
      }
    }
    

    Some examples:

    #Apply the function
    mytest(data = titanic, x = 'Sex')
    mytest(data = titanic, x = 'Freq')
    

    Output:

    mytest(data = titanic, x = 'Sex')
    
    $Male
    
        2-sample test for equality of proportions without continuity correction
    
    data:  z out of Margins
    X-squared = 0, df = 1, p-value = 1
    alternative hypothesis: two.sided
    95 percent confidence interval:
     -0.346476  0.346476
    sample estimates:
    prop 1 prop 2 
       0.5    0.5 
    
    
    $Female
    
        2-sample test for equality of proportions without continuity correction
    
    data:  z out of Margins
    X-squared = 0, df = 1, p-value = 1
    alternative hypothesis: two.sided
    95 percent confidence interval:
     -0.346476  0.346476
    sample estimates:
    prop 1 prop 2 
       0.5    0.5 
    

    Second output:

    mytest(data = titanic, x = 'Freq')
    
    Welch Two Sample t-test
    
    data:  a and b
    t = 1.013, df = 17.768, p-value = 0.3246
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -52.38066 149.75566
    sample estimates:
    mean of x mean of y 
      93.1250   44.4375