Search code examples
pythonlogistic-regressionstatsmodels

Stats Model - Insignificant Features transformation


I am working on UCI Parkinson Database https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/ When I run logistic regression using stats model, all the features come out as insignificant, need suggestions to transform the features.

logit = sm.Logit(y, X_std) result = logit.fit() print(result.summary()) logit output

Model accuracy is high at 85%, but that does not make sense as all features are insignificant.

Please help


Solution

  • Your sample size is not large enough to reliably estimate so many parameters. If there is correlation between the explanatory variables, then they can be individually statistically insignificant, even if they jointly have predictive power.

    With so many variables it's best to use penalized estimation, or use some method for feature selection to reduce the number of parameters and get more reliable estimates.