Search code examples
pythonmachine-learningscikit-learnsvmkernel-methods

Sklearn custom kernel gives wrong decision function


I have successfully implemented my own custom linear kernel which works totally fine using the clf.predict. However, when I want to use the clf.decision_function it gives constant values for all points.

This is the code for the custom kernel:

```
def linear_basis(x, y):
    return np.dot(x.T, y)

def linear_kernel(X, Y, K=linear_basis):
    gram_matrix = np.zeros((X.shape[0], Y.shape[0]))
    for i, x in enumerate(X):
        for j, y in enumerate(Y):
            gram_matrix[i,j] = K(x,y)
        return gram_matrix
```

Now using this kernel for a small linear training set.

```
#creating random 2D points
sample_size = 100
dat = {
    'x': [random.uniform(-2,2) for i in range(sample_size)],
    'y': [random.uniform(-2,2) for i in range(sample_size)]
}

data = pd.DataFrame(dat)

# giving the random points a linear structure
f_lin = np.vectorize(lambda x, y: 1 if x > y else 0)
data['z_lin'] = f_lin(data['x'].values, data['y'].values)
data_pos = data[data.z_lin == 1.]
data_neg = data[data.z_lin == 0.]

X_train = data[['x', 'y']]
y_train = data[['z_lin']]

clf_custom_lin = svm.SVC(kernel=linear_kernel) # using my custom kernel here
clf_custom_lin.fit(X_train.values,y_train.values)

# creating a 100x100 grid to manually predict each point in 2D
gridpoints = np.array([[i,j] for i in np.linspace(-2,2,100) for j in np.linspace(-2,2,100)])
gridresults = np.array([clf.predict([gridpoints[k]]) for k in range(len(gridpoints))])

# now plotting each point and the training samples
plt.scatter(gridpoints[:,0], gridpoints[:,1], c=gridresults, cmap='RdYlGn')
plt.scatter(data_pos['x'], data_pos['y'], color='green', marker='o', edgecolors='black')
plt.scatter(data_neg['x'], data_neg['y'], color='red', marker='o', edgecolors='black')
plt.show()
```

This gives the following result:

Result of the custom linear kernel.

Now I want to reproduce the plot using clf.decision_function:

(!Note that I accidentally switched the colors here!)

```
h = .02
xx, yy = np.meshgrid(np.arange(-2 - .5, 2 + .5, h),
    np.arange(-2 - .5, 2 + .5, h))

# using the .decision_function here
Z = clf_custom_lin.decision_function(np.c_[xx.ravel(), yy.ravel()]) 

Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.RdBu, alpha=.8)

plt.scatter(data_pos['x'], data_pos['y'], color='blue', marker='o', edgecolors='black')
plt.scatter(data_neg['x'], data_neg['y'], color='red', marker='o', edgecolors='black')
plt.show()
```

This gives the following plot:

Constant predictions of the custom kernel.

This is an example of a plot of the same data using the integrated linear kernel (kernel="linear"): Working decision function using the "linear" statement.

Since the prediction function for the custom kernel just worked, it should give the same working plot with the decision function here, right? I don't have any idea why this is working with the integrated linear function but not with the custom linear function which also works for just predicting points without the decision function.


Solution

  • The actual problem is really silly, but since it took quite some time to track down, I'll share an outline of my debugging.

    First, rather than plotting, print the actual values of the decision_function: you'll find that the first one comes out unique, but that after that everything is constant. Running the same on various slices of the dataset, this pattern persists. So I thought perhaps some values were being overwritten, and I dug into the SVC code a bit. That lead to some useful internal functions/attributes, like ._BaseLibSVM__Xfit containing the training data, _decision_function and _dense_decision_function, and _compute_kernel. But none of the code indicated a problem, and running them just showed the same problem. Running _compute_kernel gave results that were all-zeros past the first row, and then coming back to your code, running linear_kernel does that already. So, finally, it comes back to your linear_kernel function.

    You return inside the outer for loop, so you only ever use the first row of X, never computing the rest of the matrix. (This brings up a surprise: why did the predictions look good? That seems to have been a fluke. Changing the definition for f_lin, to change the classes, the model still learns the slope-1 line.)