I see that the documentation at VowpalWabbit command-line has mentioned about --adaptive
learning rate strategy as the default. I am not using the command line utility to train, i.e: passing the whole training file through the command line. Instead, I have a large dataset which I am iterating over and training one sample at a time using the Python online learning method as following:
for example in train_examples:
model.learn(example)
Question:
adaptive
strategy or it only works if the training file is passed through the command-line utility?--passes
argument only works when the training file is passed through the command-line utility (correct me if I'm wrong). So, the --decay_learning_rate
argument as follows also does not work in the Python online learning method?--decay_learning_rate arg (=1) Set Decay factor for learning_rate between passes
There's no single global learning rate.
There are:
-l <arg>
)Learning rates aren't constant, they decay over time as the model converges. They have good defaults, so you shouldn't normally worry about them while training.
In some more detail:
-l [ --learning_rate ] arg
. It is set once at the beginning of learning, not at every example processing, and has a good default, so there's no need to specify it. If you do specify it it will affect how fast you learn (how large are the updates) at the start, but the effect will decay as the learning progresses and automatic adjustments are applied. Setting this initial learning-rate to a high value is not advised since it might lead to early over-fitting.--adaptive
, --normalized
and --invariant
are 3 independent boolean switches that are turned on by default. Unless you really know what you're doing, just leave them alone (don't specify them) and they will be in all in effect. These 3 switches affect updates on the individual (per feature) and are orthogonal to the initial learning rate. If you explicitly turn-on one or more of these 3 switches, at the start of training (not with every example), it will implicitly turn-off the ones which you didn't explicitly specify.--decay_learning_rate arg
is only relevant when multiple passes are in effect, so you don't need to worry about it given you don't use multiple passes. It will affect an additional magnitude of adjustment between passes.Plotting your average-loss, (or loss "since last" column) over time can help you figure out convergence. You may use vw-convergence
from the vw
source utl
directory, feeding it your progress output to see your convergence. vw-convergence
is a CLI utility that requires perl and R.
Here's an example of a chart generated by vw-convergence