Search code examples
rregressionanovaquantilequantreg

anova.rq() in quantreg package in R


I'm interested in comparing estimates from different quantiles (same outcome, same covariates) using anova.rqlist function called by anova in the environment of the quantreg package in R. However the math in the function is beyond my rudimentary expertise. Lets say i fit 3 models at different quantiles;

library(quantreg)
data(Mammals) # data in quantreg to be used as a useful example
fit1 <- rq(weight ~ speed + hoppers + specials, tau = .25, data = Mammals)
fit2 <- rq(weight ~ speed + hoppers + specials, tau = .5, data = Mammals)
fit3 <- rq(weight ~ speed + hoppers + specials, tau = .75, data = Mammals)

Then i compare them using;

anova(fit1, fit2, fit3, test="Wald", joint=FALSE)

My question is which is of these models is being used as the basis of the comparison?

My understanding of the Wald test (wiki entry)

enter image description here

where θ^ is the estimate of the parameter(s) of interest θ that is compared with the proposed value θ0.

So my question is what is the anova function in quantreg choosing as the θ0?

Based on the pvalue returned from the anova my best guess is that it is choosing the lowest quantile specified (ie tau=0.25). Is there a way to specify the median (tau = 0.5) or better yet the mean estimate from obtained using lm(y ~ x1 + x2 + x3, data)?

anova(fit1, fit2, fit3, joint=FALSE)

actually produces

Quantile Regression Analysis of Deviance Table

Model: weight ~ speed + hoppers + specials
Tests of Equality of Distinct Slopes: tau in {  0.25 0.5 0.75  }

             Df Resid Df F value  Pr(>F)  
speed         2      319  1.0379 0.35539  
hoppersTRUE   2      319  4.4161 0.01283 *
specialsTRUE  2      319  1.7290 0.17911  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

while

anova(fit3, fit1, fit2, joint=FALSE)

produces the exact same result

Quantile Regression Analysis of Deviance Table

Model: weight ~ speed + hoppers + specials
Tests of Equality of Distinct Slopes: tau in {  0.5 0.25 0.75  }

             Df Resid Df F value  Pr(>F)  
speed         2      319  1.0379 0.35539  
hoppersTRUE   2      319  4.4161 0.01283 *
specialsTRUE  2      319  1.7290 0.17911  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The order of the models is clearly being changed in the anova, but how is it that the F value and Pr(>F) are identical in both tests?


Solution

  • All the quantiles you input are used and there is not one model used as a reference.

    I suggest you read this post and the related answer to understand what your "theta.0" is.

    I believe what you are trying to do is to test whether the regression lines are parallel. In other words whether the effects of the predictor variables (only income here) are uniform across quantiles.

    You can use the anova() from the quantreg package to answer this question. You should indeed use several fits for each quantile.

    When you use joint=FALSE as you did, you get coefficient-wise comparisons. But you only have one coefficient so there is only one line! And your results tells you that the effect of income is not uniform accross quantiles in your example. Use several predictor variables and you will get several p-values.

    You can do an overall test of equality of the entire sets of coefficients if you do not use joint=FALSE and that would give you a "Joint Test of Equality of Slopes" and therefore only one p-value.

    EDIT:

    I think theta.0 is the average slope for all 'tau' values or the actual estimate from 'lm()', rather than a specific slope of any of the models. My reasoning is that 'anova.rq()' does not require any specific low value of 'tau' or even the median 'tau'.

    There are several ways to test this. Either do the calculations by hand with theta.0 being equal to the average value, or compare many combinations because then you could a situation where certain of your models are close to the model with a low 'tau' values but not to the 'lm()' value. So if theta.0 is the slope of the first model with lowest 'tau' then your Pr(>F) will be high whereas in the other case, it will be low.

    This question should maybe have been asked on cross-validated.