Search code examples
rstatisticsdata-sciencesurvey

Negative adjusted R squared in svyglm


I have a large, weighted and stratified dataset containing among other things co2 emissions per household. The variable total_co2 equals the sum elec_co2 + gas_co2 + oil_co2. Before moving to more complex relationships I tried to construct a seemingly obvious model.

svy <- svydesign(id=~i_psu,
                 strata=~i_strata,
                 weights=~i_hhdenui_xw,
                 data=df1)

model <- svyglm(total_co2 ~ elec_co2, svy)
summ(model)
MODEL INFO:
Observations: 6826
Dependent Variable: total_co2
Type: Survey-weighted linear regression 

MODEL FIT:
R² = 0.31
Adj. R² = -1.74 

Standard errors: Robust
---------------------------------------------------
                       Est.    S.E.   t val.      p
----------------- --------- ------- -------- ------
(Intercept)         1962.48   83.68    23.45   0.00
elec_co2               1.27    0.05    23.98   0.00
---------------------------------------------------

Estimated dispersion parameter = 4390445

While the R^2 indicates some explanatory power, the Adj. R^2 is negative, which is usually interpreted as indicating the opposite. How is this possible in such a simple relationship? Where does the negative value come from and how should I interpret it?

Here is a simple plot of the data. enter image description here


Solution

  • The negative values for adjusted R square are due to a bug in the summ function of jtools. Further information here: https://github.com/jacob-long/jtools/issues/112