I have a bunch of a logistic regression models and I want to see how well they cluster. Effectively making a few models to represent the whole group.
However many of the models don't have the same parameters. And it seems weird to cluster on betas when it's possible not all models will have all the parameters
I would recommend clusters the log of the odds ratios for each of the explanatory variable. This way the models that don't have certain regressors you can fill in empty values with 0.0
(this can be done quite easily with pandas
Assume you have a list of all the models in this form:
models = [{'beta1': m1_b1, 'beta2': m1_b2}, {'beta1': m2_b1, 'beta3': m2_b3}]
The nomenclature above is such that m1_b1
means model 1, beta 1. You'll notice these two don't have the same betas.
You can put them into a data frame like so:
df = pd.DataFrame(models).fillna(0.0)