Search code examples
scikit-learncluster-analysislogistic-regression

Clustering logistic regression models using sci-kit learn


I have a bunch of a logistic regression models and I want to see how well they cluster. Effectively making a few models to represent the whole group.

However many of the models don't have the same parameters. And it seems weird to cluster on betas when it's possible not all models will have all the parameters


Solution

  • I would recommend clusters the log of the odds ratios for each of the explanatory variable. This way the models that don't have certain regressors you can fill in empty values with 0.0 (this can be done quite easily with pandas

    Assume you have a list of all the models in this form:

    models = [{'beta1': m1_b1, 'beta2': m1_b2}, {'beta1': m2_b1, 'beta3': m2_b3}]
    

    The nomenclature above is such that m1_b1 means model 1, beta 1. You'll notice these two don't have the same betas.

    You can put them into a data frame like so:

    df = pd.DataFrame(models).fillna(0.0)