I apply a logistic regression
and I would like to test for statistical sigificance of my overall model.
Now, the pseudo-Rsquared (McFaddon) Rsquared = 1 - L(c)/L(null)
returns the variance explained by the model - where L(c)
denotes the maximized likelihood value from the fitted model and L(null)
denotes the corresponding value for the null model (no covariates, only intercept).
The likelihood test statistic is LR = 2 * (L(c) - L(null))
which follows a Chi-squared
distribution and can be tested for significance according to the models degree of freedoms.
Anyways, I use the Chi-squared
to calculate a p-value
which is highly significant, but the pseudo Rsquared
is around 0.021
???
Why does Rsquared and the overall p-value differ so much?
Using an accuracy calulation for some test-data metrics.accuracy_score(y_test, y_pred)
, I see that the accuracy for the test data is only around 55% (for the training data its around 60%).
Can someone help me to interpret my results?
Maybe there is a correlation which is significant, but the impact is still small: since you are doing classification, you could check if examples with this variable (=1 in the binary case) have a slightly higher/lower probability to be member of class 1 than those without that variable (=0 in the binary case):
examples with the variable being 1 have a chance of 50% to belong to class 1 while examples with that variable being 0 have a chance of 48% to belong to class 1.
If lots of examples exist who have that variable, the effect might still be significant (p value), but it will hardly predict the right class alone (explain the variance - r squared).
This might be the reference which could help you understand this graphically for another problem: https://blog.minitab.com/blog/adventures-in-statistics-2/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values