I am using the quantreg
package to predict quantiles and their confidence intervals. I can't understand why the predicted quantiles are different from the quantiles calculated directly from the data using quantile()
.
library(tidyverse)
library(quantreg)
data <- tibble(data=runif(10)*10)
qr1 <- rq(formula=data ~ 1, tau=0.9, data=data) # quantile regression
yqr1<- predict(qr1, newdata=tibble(data=c(1)), interval='confidence', level=0.95, se='boot') # predict quantile
q90 <- quantile(data$data, 0.9) # quantile of sample
> yqr1
fit lower higher
1 6.999223 3.815588 10.18286
> q90
90%
7.270891
You should realize the predicting the 90th percentile for a dataset with only 10 items is really based solely on the two highest values. You should review the help page for quantile where you will find multiple definitions of the term.
When I run this, I see:
yqr1<- predict(qr1, newdata=tibble(data=c(1)) )
yqr1
1
8.525812
And when I look at the data I see:
data
# A tibble: 10 x 1
data
<dbl>
1 8.52581158
2 7.73959380
3 4.53000680
4 0.03431813
5 2.13842058
6 5.60713159
7 6.17525537
8 8.76262959
9 5.30750304
10 4.61817190
So the rq
function is estimating the second highest value as the 90th percentile, which seems perfectly reasonable. The quantile
result is not actually estimated that way:
quantile(data$data, .9)
# 90%
#8.549493
?quantile