python machine-learning scikit-learn logistic-regression

Logistic regression coefficient meaning

I'm trying to write my own logistic regressor (using batch/mini-batch gradient descent) for practice purposes.

I generated a random dataset (see below) with normally distributed inputs, and the output is binary (0,1). I manually used coefficients for the input and was hoping to be able to reproduce them (see below for the code snippet). However, to my surprise, neither my own code, nor sklearn LogisticRegression were able to reproduce the actual numbers (although the sign and order of magnitude are in line). Moreso, the coefficients my algorithm produced are different than the one produced by sklearn. Am I misinterpreting what the coefficients for a logistic regression are?

I will appreciate any insight into this discrepancy.

Thank you!

edit: I tried using statsmodels Logit and got yet a third set of slightly different values for the coefficients

Some more info that might be relevant: I wrote a linear regressor using an almost identical code and it worked perfectly, so I am fairly confident this is not a problem in the code. Also my regressor actually outperformed the sklearn one on the training set, and they have the exact same accuracy on the test set, so I have no reason to believe the regressors are wrong.

Code snippets for the generation of the dataset:

o1 = 2
o2 = -3
x[:,1]=np.random.rand(size)*2
x[:,2]=np.random.rand(size)*3
y = np.vectorize(sigmoid)(x[:,1]*o1+x[:,2]*o2 + np.random.normal(size=size))

so as can be seen, input coefficients are +2 and -3 (intercept 0); sklearn coefficients were ~2.8 and ~-4.8; my coefficients were ~1.7 and ~-2.6

and of the regressor (the most relevant parts of it):

for j in range(bin_size):
    xs = x[i]
    y_real = y[i]
    z = np.dot(self.coeff,xs)
    h = sigmoid(z)
    dc+= (h-y_real)*xs
self.coeff-= dc * (learning_rate/n)

Solution

What was the intercept learned? It really should not be a surprise, as your y is polynomial of 3rd degree, while your model has only two coefficients, while 3 + y-intercept would be needed to model the response variable from predictors.

Furthermore, values may be different due to SGD for example.

Not really sure, but the coefficients could be different and return correct y for finite set of points. What are the metrics on each model? Do those differ?