I have a dataframe containing a column of salaries. I would like to calculate the confidence interval at 97% around the median value. t.test calculates the mean value not the median. Do you know how I could perform this? this is the output of t.test on my column:
t.test(Salary)
One Sample t-test
data: Salary
t = 26.131, df = 93, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
37235.65 43360.56
sample estimates:
mean of x
40298.1
Although the median is:
median(na.omit(Salary))
[1] 36000
Thanks
If your data are paired you can do a simple sign test, which is essentially a binomial test. You see how many of the pairs where the sample from one population is larger than the other, and do a test on the success/failure rate.
set.seed(1)
x2 <- runif(30, 0.5, 2)^2
y2 <- runif(30, 0.5, 2)^2 + 0.5
bino <- x2 < y2
binom.test(sum(bino), length(bino), conf.level=0.97)
If your data isn't paired you can perform a Mann-Whitney test, this is a test on ranks. You see how many samples from one population are larger than how many samples in the other population, and the reverse.
x <- c(80, 83, 189, 104, 145, 138, 191, 164, 73, 146, 124, 181)*1000
y <- c(115, 88, 90, 74, 121, 133, 97, 101, 81)*1000
wilcox.test(x, y, conf.int=TRUE, conf.level=0.97)
There's also a paired variant of the Mann-Whitney test called the Wilcoxon signed rank test, which can be an alternative to the simple sign test.
wilcox.test(x2, y2, paired=TRUE, conf.int=TRUE, conf.level=0.97)
Wilcoxon assumes symmetry around the median, the simple sign test doesn't. Something to keep in mind. Also if you want to interpret the Mann-Whitney test as a difference in medians you'll have to assume that the two populations have the same shape, and only the location has been shifted.
A radically different approach would be to bootstrap the difference in medians.
A naïve implementation:
set.seed(1)
rr <- replicate(
1e3,
median(sample(x, length(x), replace=TRUE)) -
median(sample(y, length(y), replace=TRUE))
)
rr <- jitter(rr, 50)
plot(density(rr))
qu <- quantile(rr, probs=c((1-0.97)/2, 1 - (1-0.97)/2))
abline(v=qu, col="blue")