I'm trying to piece together how the SGDClassifier picks its learning rate when I use the partial_fit
method to train it.
I.e., my main learning loop looks like this:
from sklearn.linear_model import SGDClassifier
m = SGDClassifier(n_iter=1, alpha=0.01)
n_iter = 40
t0 = time.time()
for i in range(n_iter):
for fname in files:
X, y = load_next_batch(fname)
m.partial_fit(X, y, classes = [0, 1])
print "%d: valid-error: %f (time: %fs)" % (i, 1.0-m.score(Xvalid, yvalid), time.time() - t0)
now, since I make 40 passes through the whole training set, I'd like to anneal my learning rate over time. If i'd use fit
instead of partial fit, it is my understanding that this would happen automatically (unless I'd modify the learning_rate
parameter).
However, It is unclear to me how this happens when using partial fit. Skimming the code didn't help either. Could anyone clarify how I could achieve an annealed learning rate in my setting?
fit
is using partial_fit
internally, so the learning rate configuration parameters apply for both fit
an partial_fit
. The default annealing schedule is eta0 / sqrt(t)
with eta0 = 0.01
.
Edit: this is not correct, as seen in the comments the actual default schedule for SGDClassifier
is:
1.0 / (t + t0)
where t0
is set heuristically and t
is the number of samples seen in the past.