I'm trying to understand the difference between RidgeClassifier and LogisticRegression in sklearn.linear_model
. I couldn't find it in the documentation.
I think I understand quite well what the LogisticRegression does.It computes the coefficients and intercept to minimise half of sum of squares of the coefficients + C times the binary cross-entropy loss
, where C is the regularisation parameter. I checked against a naive implementation from scratch, and results coincide.
Results of RidgeClassifier differ and I couldn't figure out, how the coefficients and intercept are computed there? Looking at the Github code, I'm not experienced enough to untangle it.
The reason why I'm asking is that I like the RidgeClassifier results -- it generalises a bit better to my problem. But before I use it, I would like to at least have an idea where does it come from.
Thanks for possible help.
RidgeClassifier()
works differently compared to LogisticRegression()
with l2 penalty. The loss function for RidgeClassifier()
is not cross entropy.
RidgeClassifier()
uses Ridge()
regression model in the following way to create a classifier:
Let us consider binary classification for simplicity.
Convert target variable into +1
or -1
based on the class in which it belongs to.
Build a Ridge()
model (which is a regression model) to predict our target variable. The loss function is MSE + l2 penalty
If the Ridge()
regression's prediction value (calculated based on decision_function()
function) is greater than 0, then predict as positive class else negative class.
For multi-class classification:
Use LabelBinarizer()
to create a multi-output regression scenario, and then train independent Ridge()
regression models, one for each class (One-Vs-Rest modelling).
Get prediction from each class's Ridge()
regression model (a real number for each class) and then use argmax
to predict the class.