Search code examples
rregressionrobustness

Address unequal variance between groups before applying contrasts for a linear model? (r)


My Goal: I have an ordinal factor variable (5 levels) to which I would like to apply contrasts to test for a linear trend. However, the factor groups have heterogeneity of variance.

What I've done: Upon recommendation, I used lmRob() from robust pckg to create a robust linear model, then applied the contrasts.

# assign the codes for a linear contrast of 5 groups, save as object
contrast5 <- contr.poly(5)

# set contrast property of sf1 to contain the weights
contrasts(SCI$sf1) <- contrast5

# fit and save a robust model (exhaustive instead of subsampling)
robmod.sf1 <- lmRob(ICECAP_A ~ sf1, data = SCI, nrep = Exhaustive)

summary.lmRob(robmod.sf1)

My problem: I have since been reading that robust regression is more suited to address outliers, and not heterogeneity of variance. (bottom of https://stats.idre.ucla.edu/r/dae/robust-regression/_ ) This UCLA page (among others) suggests the sandwich package to get heteroskedastic-consistent (HC) standard errors (such as in https://thestatsgeek.com/2014/02/14/the-robust-sandwich-variance-estimator-for-linear-regression-using-r/ ).

But these examples use a series of functions/calls to generate output that gives you the HC that could be used to calculate confidence intervals, t-values, p-values etc.

My thinking is that if I use vcovHC(), I could get the HC std errors, but the HC std errors would not have been 'applied'/a property of the model, so I couldn't pass the model (with the HC errors) through a function to apply the contrasts that I ultimately want. I hope I am not conflating two separate concepts, but surely if a function addresses/down-weights outliers, that should at least somewhat address unequal variances as well?

Can anyone confirm if my reasoning is sound (and thus remain with lmRob()? Or suggest how I could just correct my standard errors and still apply the contrasts?


Solution

  • vcovHC is the right function to deal with heteroscedasticity. HC stands for heteroscedasticity-consistent estimator. This will not downweight outliers in estimates of model effects, but it will calculated the CIs and p-values differently to accommodate the impact of such outlying observations. lmRob does downweight outlying values and does not handle heteroscedasticity

    See more here: https://stats.stackexchange.com/questions/50778/sandwich-estimator-intuition/50788#50788