When I do ridge regression using sklearn in Python, the coef_
output gives me a 2D array. According to the documentation it is (n_targets, n_features)
.
I understand that features are my coefficients. However, I am not sure what targets are. What is this?
The targets are the values you want to predict. The ridge regression can in fact predict more values for each instance, not only one. The coef_
contain the coefficients for the prediction of each of the targets. It is also the same as if you trained a model to predict each of the targets separately.
Let's have a look at a simple example. I will use LinearRegression
instead of Ridge
, as Ridge
shrinks the values of the coefficients and make it harder to understand.
First, we create some random data:
X = np.random.uniform(size=100).reshape(50, 2)
y = np.dot(X, [[1, 2, 3], [3, 4, 5]])
The first three instances in X
are:
[[ 0.70335619 0.42612165]
[ 0.2959883 0.10571314]
[ 0.33868804 0.07351525]]
The targets y
for these instances are
[[ 1.98172114 3.11119897 4.24067681]
[ 0.61312771 1.01482915 1.41653058]
[ 0.55923378 0.97143708 1.38364037]]
Notice, that y[0] = x[0]+3*x[1]
, y[1] = 2*x[0] + 4*x[1]
and y[2] = 3*x[0] + 5*x[1]
(that's how we created the data with the matrix multiplication).
If we now fit the linear regression model
clf = linear_model.LinearRegression()
clf.fit(X, y)
the coef_
s are:
[[ 1. 3.]
[ 2. 4.]
[ 3. 5.]]
This exactly matches the equations we used to create the data.