I'm just getting started with R and I need your help on performing an independent sample t-test. I have tried different codes but I keep getting errors. The dataset is a pretty big one, provided by my teacher, and it's essentially about how people perceive different types of humor. My task is to find what the difference between men (coded as 5) and women (coded as 4) is on the imgagg1
variable. Here's what I tried:
Xdata<-Xdata[-c(1,2,311,312,313,614,619,808,815),] # I eliminated these rows because of this error that I keep getting even after removing the rows: In mean.default(x) : argument is not numeric or logical: returning NA
Women<-Xdata[which(Xdata$gender=="4"),"imgagg1"]
Men<-Xdata[which(Xdata$gender=="5"),"imgagg1"]
t.test(Xdata$Women,Xdata$Men)
I get the following errors:
Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In mean.default(y) : argument is not numeric or logical: returning NA
I also tried this, but get the same errors:
Xdata<-Xdata[-c(1,2,311,312,313,614,619,808,815),]
Women<-Xdata%>%
filter(gender=="4")%>%
pull(imgagg1)
Men<-Xdata%>%
filter(gender=="5")%>%
pull(imgagg1)
t.test(Women,Men)
Can somebody please tell me what I'm doing wrong? I've been busting my head over this but can't seem to get it right.
I believe there are two things going on. If the structure of your data are correct, your numbers are actually considered characters by R. Also, there may be some confusion in your application of the t.test
. You create two separate datasets, Men
and Women
- then you used t.test(Xdata$Women,Xdata$Men)
- this is trying to find the variable Men
or Women
in the dataset Xdata
, but those variables dont exist (Men
and Women
are their own datasets with one variable, imgagg1
).
To run t.test()
on your sample data, I did the following:
Xdata <- structure(list(gender = c(NA, "7", NA, "4", "4", "4", "5", "4", "4", "4", "5", "5", "5", "4", "4", "4", "4", "4", "4", "5", "5", "4", "6", "4", "4"), imgagg1 = c(NA, NA, NA, "5", "5", "4", "3", "4", "1", "5", "4", "5", "6", "7", "4", "6", "3", "1", "5", "2", "5", "6", "5", "7", "2")), row.names = c(NA, 25L), class = c("tbl_df", "tbl", "data.frame"))
# Colums are currently character, Convert these two columns to numeric. Not the numbers here reflect the position in this simplified dataset. In the real dataset, you will want to identify them as `c(x,y)` assuming `gender` and `imgagg1` are in column number x and y, respectively.
Xdata[1:2] <- lapply(Xdata[1:2], as.numeric)
Women <- Xdata[which(Xdata$gender == 4),"imgagg1"]
Men <- Xdata[which(Xdata$gender == 5),"imgagg1"]
t.test(Women,Men)
# > t.test(Women,Men)
#
# Welch Two Sample t-test
#
# data: Women and Men
# t = 0.21418, df = 12.083, p-value = 0.834
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -1.527540 1.860873
# sample estimates:
# mean of x mean of y
# 4.333333 4.166667
You also dont need to remove the missing data in the step Xdata[-c(1,2,311,312,313,614,619,808,815),]
- the function na.omit = TRUE
will
(as you can guess!) omit the NA
values. Most functions for mathematical functions will allow you to omit NA
values like this (i.e., sum(x, na.omit = TRUE)
)
Hope this helps and good luck!