Search code examples
pythonscikit-learnregressionlinear-regressionregularized

What is target in Python's sklearn coef_ output?


When I do ridge regression using sklearn in Python, the coef_ output gives me a 2D array. According to the documentation it is (n_targets, n_features).

I understand that features are my coefficients. However, I am not sure what targets are. What is this?


Solution

  • The targets are the values you want to predict. The ridge regression can in fact predict more values for each instance, not only one. The coef_ contain the coefficients for the prediction of each of the targets. It is also the same as if you trained a model to predict each of the targets separately.

    Let's have a look at a simple example. I will use LinearRegression instead of Ridge, as Ridge shrinks the values of the coefficients and make it harder to understand.

    First, we create some random data:

    X = np.random.uniform(size=100).reshape(50, 2)
    y = np.dot(X, [[1, 2, 3], [3, 4, 5]])
    

    The first three instances in X are:

    [[ 0.70335619  0.42612165]
     [ 0.2959883   0.10571314]
     [ 0.33868804  0.07351525]]
    

    The targets y for these instances are

    [[ 1.98172114  3.11119897  4.24067681]
     [ 0.61312771  1.01482915  1.41653058]
     [ 0.55923378  0.97143708  1.38364037]]
    

    Notice, that y[0] = x[0]+3*x[1], y[1] = 2*x[0] + 4*x[1] and y[2] = 3*x[0] + 5*x[1] (that's how we created the data with the matrix multiplication).

    If we now fit the linear regression model

    clf = linear_model.LinearRegression()
    clf.fit(X, y) 
    

    the coef_s are:

    [[ 1.  3.]
     [ 2.  4.]
     [ 3.  5.]]
    

    This exactly matches the equations we used to create the data.