Search code examples
pythonpython-3.xscikit-learnlogistic-regression

Ordinal logistic regression: Intercept_ returns [1] instead of [n]


I'm running an ordinal (i.e. multinomial) ridge regression using mord (scikitlearn) library.

y is a single column containing integer values from 1 to 19.

X is made of 7 numerical variables binned in 4 buckets, and dummied into a final of 28 binary variables.

import pandas as pd
import numpy as np    
from sklearn import metrics
from sklearn.model_selection import train_test_split
import mord

in_X, out_X, in_y, out_y = train_test_split(X, y,
                                            stratify=y,
                                            test_size=0.3,
                                            random_state=42)

mul_lr = mord.OrdinalRidge(alpha=1.0,
                           fit_intercept=True,
                           normalize=False,
                           copy_X=True,
                           max_iter=None,
                           tol=0.001,
                           solver='auto').fit(in_X, in_y)

mul_lr.coef_ returns a [28 x 1] array but mul_lr.intercept_ returns a single value (instead of 19).

Any Idea what I am missing?


Solution

  • If you would like your model to predict for all 19 categories, you need to first convert your label y to one hot encoding before training a model.

    from sklearn.preprocessing import OneHotEncoder
    
    y-=1 # range from 1 to 19 -> range from 0 to 18
    enc = OneHotEncoder(n_values=19)
    y = enc.fit_transform(y).toarray()
    """
    train a model
    """
    

    Now mul_lr.intercept_.shape should be (19,).