Search code examples
rstatisticsanalysismedian

Student t.test for the median (not the mean)


I have a dataframe containing a column of salaries. I would like to calculate the confidence interval at 97% around the median value. t.test calculates the mean value not the median. Do you know how I could perform this? this is the output of t.test on my column:

t.test(Salary)
One Sample t-test
data:  Salary
t = 26.131, df = 93, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
37235.65 43360.56
sample estimates:
mean of x 
40298.1 

Although the median is:

median(na.omit(Salary))
[1] 36000

Thanks


Solution

  • If your data are paired you can do a simple sign test, which is essentially a binomial test. You see how many of the pairs where the sample from one population is larger than the other, and do a test on the success/failure rate.

    set.seed(1)
    
    x2 <- runif(30, 0.5, 2)^2
    y2 <- runif(30, 0.5, 2)^2 + 0.5
    
    bino <- x2 < y2
    
    binom.test(sum(bino), length(bino), conf.level=0.97)
    

    If your data isn't paired you can perform a Mann-Whitney test, this is a test on ranks. You see how many samples from one population are larger than how many samples in the other population, and the reverse.

    x <- c(80, 83, 189, 104, 145, 138, 191, 164, 73, 146, 124, 181)*1000
    y <- c(115, 88, 90, 74, 121, 133, 97, 101, 81)*1000
    
    wilcox.test(x, y, conf.int=TRUE, conf.level=0.97)
    

    There's also a paired variant of the Mann-Whitney test called the Wilcoxon signed rank test, which can be an alternative to the simple sign test.

    wilcox.test(x2, y2, paired=TRUE, conf.int=TRUE, conf.level=0.97)
    

    Wilcoxon assumes symmetry around the median, the simple sign test doesn't. Something to keep in mind. Also if you want to interpret the Mann-Whitney test as a difference in medians you'll have to assume that the two populations have the same shape, and only the location has been shifted.


    A radically different approach would be to bootstrap the difference in medians.
    A naïve implementation:

    set.seed(1)
    rr <- replicate(
      1e3, 
      median(sample(x, length(x), replace=TRUE)) -
      median(sample(y, length(y), replace=TRUE))
    )
    
    rr <- jitter(rr, 50)
    plot(density(rr))
    qu <- quantile(rr, probs=c((1-0.97)/2, 1 - (1-0.97)/2))
    abline(v=qu, col="blue")
    

    enter image description here