Search code examples
rt-test

Errors in independent t-test in R


I'm just getting started with R and I need your help on performing an independent sample t-test. I have tried different codes but I keep getting errors. The dataset is a pretty big one, provided by my teacher, and it's essentially about how people perceive different types of humor. My task is to find what the difference between men (coded as 5) and women (coded as 4) is on the imgagg1 variable. Here's what I tried:

Xdata<-Xdata[-c(1,2,311,312,313,614,619,808,815),] # I eliminated these rows because of this error that I keep getting even after removing the rows: In mean.default(x) : argument is not numeric or logical: returning NA

Women<-Xdata[which(Xdata$gender=="4"),"imgagg1"]

Men<-Xdata[which(Xdata$gender=="5"),"imgagg1"]

t.test(Xdata$Women,Xdata$Men)

I get the following errors:

Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In mean.default(y) : argument is not numeric or logical: returning NA

I also tried this, but get the same errors:

Xdata<-Xdata[-c(1,2,311,312,313,614,619,808,815),]
Women<-Xdata%>%
  filter(gender=="4")%>%
  pull(imgagg1)
Men<-Xdata%>%
  filter(gender=="5")%>%
  pull(imgagg1)
t.test(Women,Men)

Can somebody please tell me what I'm doing wrong? I've been busting my head over this but can't seem to get it right.


Solution

  • I believe there are two things going on. If the structure of your data are correct, your numbers are actually considered characters by R. Also, there may be some confusion in your application of the t.test. You create two separate datasets, Men and Women - then you used t.test(Xdata$Women,Xdata$Men) - this is trying to find the variable Men or Women in the dataset Xdata, but those variables dont exist (Men and Women are their own datasets with one variable, imgagg1).

    To run t.test() on your sample data, I did the following:

    Xdata <- structure(list(gender = c(NA, "7", NA, "4", "4", "4", "5", "4",  "4", "4", "5", "5", "5", "4", "4", "4", "4", "4", "4", "5", "5",  "4", "6", "4", "4"), imgagg1 = c(NA, NA, NA, "5", "5", "4", "3",  "4", "1", "5", "4", "5", "6", "7", "4", "6", "3", "1", "5", "2",  "5", "6", "5", "7", "2")), row.names = c(NA, 25L), class = c("tbl_df",  "tbl", "data.frame"))
    
    # Colums are currently character, Convert these two columns to numeric. Not the numbers here reflect the position in this simplified dataset. In the real dataset, you will want to identify them as `c(x,y)` assuming `gender` and `imgagg1` are in column number x and y, respectively.
    Xdata[1:2] <- lapply(Xdata[1:2], as.numeric)
    
    Women <- Xdata[which(Xdata$gender == 4),"imgagg1"]
    
    Men <- Xdata[which(Xdata$gender == 5),"imgagg1"]
    
    t.test(Women,Men)
    
    # > t.test(Women,Men)
    # 
    # Welch Two Sample t-test
    # 
    # data:  Women and Men
    # t = 0.21418, df = 12.083, p-value = 0.834
    # alternative hypothesis: true difference in means is not equal to 0
    # 95 percent confidence interval:
    #   -1.527540  1.860873
    # sample estimates:
    #   mean of x mean of y 
    # 4.333333  4.166667  
    

    You also dont need to remove the missing data in the step Xdata[-c(1,2,311,312,313,614,619,808,815),] - the function na.omit = TRUE will (as you can guess!) omit the NA values. Most functions for mathematical functions will allow you to omit NA values like this (i.e., sum(x, na.omit = TRUE))

    Hope this helps and good luck!