I'm running an ordinal (i.e. multinomial) ridge regression using mord
(scikitlearn
) library.
y
is a single column containing integer values from 1 to 19.
X
is made of 7 numerical variables binned in 4 buckets, and dummied into a final of 28 binary variables.
import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
import mord
in_X, out_X, in_y, out_y = train_test_split(X, y,
stratify=y,
test_size=0.3,
random_state=42)
mul_lr = mord.OrdinalRidge(alpha=1.0,
fit_intercept=True,
normalize=False,
copy_X=True,
max_iter=None,
tol=0.001,
solver='auto').fit(in_X, in_y)
mul_lr.coef_
returns a [28 x 1] array but mul_lr.intercept_
returns a single value (instead of 19).
Any Idea what I am missing?
If you would like your model to predict for all 19 categories, you need to first convert your label y
to one hot encoding before training a model.
from sklearn.preprocessing import OneHotEncoder
y-=1 # range from 1 to 19 -> range from 0 to 18
enc = OneHotEncoder(n_values=19)
y = enc.fit_transform(y).toarray()
"""
train a model
"""
Now mul_lr.intercept_.shape
should be (19,)
.