Search code examples
rt-test

performing a T-test in R with categorical variables


Hey guys i am trying to do a t-test but it looks like something is wrong ... The data looks like:

pot pair    type    height
I   1   Cross   23,5
I   1   Self    17,375
I   2   Cross   12
I   2   Self    20,375

I performed the t-test as :

    darwin <- read.table("darwin.txt", header=T)
    plot(darwin$type, darwin$height, ylab="Height")
    darwin.no.outlier = subset(darwin, height>13)
    tapply(darwin.no.outlier$height, darwin.no.outlier$type, var) 
    t.test(darwin$height ~ darwin$type)

the error R gives me is as followed :

Error in

if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : 
  missing value where TRUE/FALSE needed

In addition: Warning messages:

1: In mean.default(x) : argument is not numeric or logical: returning NA

2: In var(x) :

Calling var(x) on a factor x is deprecated and will become an error.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

3: In mean.default(y) : argument is not numeric or logical: returning NA

4: In var(y) :

Calling var(x) on a factor x is deprecated and will become an error.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

Solution

  • The problem is your decimal place, which is a comma instead of a dot in your column height. Because of comma separator for decimals, your column is being converted to factors and hence the error you get.

    When importing data, insert "dec = "," (which is the character used in the file for decimal points) in read.table. So my example with your data:

    darwin <- read.table(text = "pot pair    type    height
    I   1   Cross   23,5
               I   1   Self    17,375
               I   2   Cross   12
               I   2   Self    20,375", header = TRUE, dec = ",")
    

    And then the output of

    t.test(darwin$height ~ darwin$type)
    

    Is this:

        Welch Two Sample t-test
    
    data:  darwin$height by darwin$type
    t = -0.18932, df = 1.1355, p-value = 0.878
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -58.34187  56.09187
    sample estimates:
    mean in group Cross  mean in group Self 
                 17.750              18.875