Search code examples
vowpalwabbit

Will learning_rate be automatically adjusted using "adaptive" strategy for VowpalWabbit Python online learning?


I see that the documentation at VowpalWabbit command-line has mentioned about --adaptive learning rate strategy as the default. I am not using the command line utility to train, i.e: passing the whole training file through the command line. Instead, I have a large dataset which I am iterating over and training one sample at a time using the Python online learning method as following:

for example in train_examples:
    model.learn(example)

Question:

  1. When using the Python online learning method, is the learning rate is also adjusted automatically using the adaptive strategy or it only works if the training file is passed through the command-line utility?
  2. From my understanding, the --passes argument only works when the training file is passed through the command-line utility (correct me if I'm wrong). So, the --decay_learning_rate argument as follows also does not work in the Python online learning method?

--decay_learning_rate arg (=1) Set Decay factor for learning_rate between passes


Solution

  • There's no single global learning rate.

    There are:

    • A single default initial learning rate (-l <arg>)
    • Multiple (per-feature) learning rates maintained while learning

    Learning rates aren't constant, they decay over time as the model converges. They have good defaults, so you shouldn't normally worry about them while training.

    In some more detail:

    • Initial learning rate, optionally set with -l [ --learning_rate ] arg. It is set once at the beginning of learning, not at every example processing, and has a good default, so there's no need to specify it. If you do specify it it will affect how fast you learn (how large are the updates) at the start, but the effect will decay as the learning progresses and automatic adjustments are applied. Setting this initial learning-rate to a high value is not advised since it might lead to early over-fitting.
    • --adaptive, --normalized and --invariant are 3 independent boolean switches that are turned on by default. Unless you really know what you're doing, just leave them alone (don't specify them) and they will be in all in effect. These 3 switches affect updates on the individual (per feature) and are orthogonal to the initial learning rate. If you explicitly turn-on one or more of these 3 switches, at the start of training (not with every example), it will implicitly turn-off the ones which you didn't explicitly specify.
    • As you figured, --decay_learning_rate arg is only relevant when multiple passes are in effect, so you don't need to worry about it given you don't use multiple passes. It will affect an additional magnitude of adjustment between passes.

    Plotting your average-loss, (or loss "since last" column) over time can help you figure out convergence. You may use vw-convergence from the vw source utl directory, feeding it your progress output to see your convergence. vw-convergence is a CLI utility that requires perl and R.

    Here's an example of a chart generated by vw-convergence

    vw online training loss convergence/progress