I'd like to compare the performance of the SVM classifier with and without the stochastic gradient descent. In the sklearn I've only found the SGDClassifier (that I can put into a pipeline). Doesn't sklearn provide an implementation of a non-stochastic (batch) gradient descent classifier? Do I have to implement both classifiers on my own in order to conduct an analysis?
When SVMs and SGD can't be combined
SVMs are often used in combination with the kernel trick, which enables classification of non-linearly separable data. This answer explains why you wouldn't use stochastic gradient descent to solve a kernelised SVM: https://stats.stackexchange.com/questions/215524/is-gradient-descent-possible-for-kernelized-svms-if-so-why-do-people-use-quadr
Linear SVMs
If we stick to Linear SVMs, then we can run an experiment using sklearn
, as it provides wrappers over libsvm
(SVC
), liblinear
(LinearSVC
) and also offers the SGDClassifier
. Recommend reading the linked documentation of libsvm
and liblinear
to understand what is happening under the hood.
Comparison on example dataset
Below is a comparison of computational performance and accuracy over a randomly generated dataset (which may not be representative of your problem). You should alter the problem to fit your requirements.
import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC, LinearSVC
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split
# Randomly generated dataset
# Linear function + noise
np.random.seed(0)
X = np.random.normal(size=(50000, 10))
coefs = np.random.normal(size=10)
epsilon = np.random.normal(size=50000)
y = (X @ coefs + epsilon) > 0
# Classifiers to compare
algos = {
'LibSVM': {
'model': SVC(),
'max_n': 4000,
'time': [],
'error': []
},
'LibLinear': {
'model': LinearSVC(dual=False),
'max_n': np.inf,
'time': [],
'error': []
},
'SGD': {
'model': SGDClassifier(max_iter=1000, tol=1e-3),
'max_n': np.inf,
'time': [],
'error': []
}
}
splits = list(range(100, 1000, 100)) + \
list(range(1500, 5000, 500)) + \
list(range(6000, 50000, 1000))
for i in splits:
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=1-i/50000,
random_state=0)
for k, v in algos.items():
if i < v['max_n']:
model = v['model']
t0 = time.time()
model.fit(X_train, y_train)
t1 = time.time()
v['time'].append(t1 - t0)
preds = model.predict(X_test)
e = (preds != y_test).sum() / len(y_test)
v['error'].append(e)
Plotting the results, we see that the traditional libsvm solver cannot be used on large n, while the liblinear and SGD implementations scale well computationally.
plt.figure()
for k, v in algos.items():
plt.plot(splits[:len(v['time'])], v['time'], label='{} time'.format(k))
plt.legend()
plt.semilogx()
plt.title('Time comparison')
plt.show()
Plotting the error, we see that SGD is worse than LibSVM for the same training set, but if you have a large training set this becomes a minor point. The liblinear
algorithm performs best on this dataset:
plt.figure()
for k, v in algos.items():
plt.plot(splits[:len(v['error'])], v['error'], label='{} error'.format(k))
plt.legend()
plt.semilogx()
plt.title('Error comparison')
plt.xlabel('Number of training examples')
plt.ylabel('Error')
plt.show()