I have some very rudimentary code for generating a chisq test for some of the variables in the titanic dataset. I would like to have a way to differentiate categorical vs numeric/cont variables, so it would only do the chisq test on the categorical variables or t.tests if there were to be numeric variables.
I'm interested in being able to compare multiple levels between the Survived and Not-Survived groups like so:
Prop Survived Female vs Prop Not-Survived Female, Prop Survived Class 1 vs Prop Not-Survived Class 1, And so on..
The table subsets are directed for The Survived/Not-Survived Female comparison
library(Titanic)
titanic <- as.data.frame(Titanic)
names <- names(titanic)
names(cars)
for (var in names) {
tabla<-table(titanic$Survived, titanic[[var]])
tabla<-addmargins(tabla)
print(tab)
res<-prop.test(x = c(tabla[1,2], tabla[2,2]), n = c(tabla[1,3], tabla[2,3]), correct = F)
print(var)
print(res)
}
}
Thank you
I would suggest you working with a function that detects the class of variable. I have sketched one function and you could modify if needed. It requires two arguments, the dataframe and the name of the variable.
library(titanic)
#Data
data("Titanic")
titanic <- as.data.frame(Titanic)
#Function
mytest <- function(data,x)
{
#Detect the type of var
if(is.numeric(data[[x]]))
{
#Build variables x and y
a <- data[[x]][data$Survived=='No']
b <- data[[x]][data$Survived=='Yes']
#Apply the test
Res <- t.test(a,b)
print(Res)
} else
{
#Create table
tab <- table(data$Survived,data[[x]])
#Split in a list of vectors
L1 <- lapply(1:ncol(tab), function(i) {tab[,i] })
names(L1) <- dimnames(tab)[[2]]
#Margins
Margins <- rowSums(tab)
#Test
L2 <- lapply(L1, function(z) {prop.test(x = z, n = Margins, correct = F)})
print(L2)
}
}
Some examples:
#Apply the function
mytest(data = titanic, x = 'Sex')
mytest(data = titanic, x = 'Freq')
Output:
mytest(data = titanic, x = 'Sex')
$Male
2-sample test for equality of proportions without continuity correction
data: z out of Margins
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: two.sided
95 percent confidence interval:
-0.346476 0.346476
sample estimates:
prop 1 prop 2
0.5 0.5
$Female
2-sample test for equality of proportions without continuity correction
data: z out of Margins
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: two.sided
95 percent confidence interval:
-0.346476 0.346476
sample estimates:
prop 1 prop 2
0.5 0.5
Second output:
mytest(data = titanic, x = 'Freq')
Welch Two Sample t-test
data: a and b
t = 1.013, df = 17.768, p-value = 0.3246
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-52.38066 149.75566
sample estimates:
mean of x mean of y
93.1250 44.4375