Search code examples
vowpalwabbit

Vowpalwabbit Base Learner - adaptive and normalize


I'm trying to understand the base learner in vowpalwabbit. I understand the online gradient descent and the feature hashing. I'm trying to understand the adaptive and normalize features it uses. I understand the point of the features (to change the learning rate and even features), I was hoping to understand how they are programmed into vowpalwabbit. Can someone share the pseudo-code for these features or point me to them in the code base?


Solution

  • You won't find gd.cc particularly easy to read. It is designed for throughput. Having said that ...

    Adaptive is based upon adagrad, which tries to adjust the effective learning rate per feature. To achieve this it accumulates a sum of squared gradients per feature, https://github.com/VowpalWabbit/vowpal_wabbit/blob/aa88627c9e9ffed6c0eea165ff85b04f0a22c0b7/vowpalwabbit/gd.cc#L502 .

    Normalize is based upon a dimensional analysis argument. It tries to adjust the weights to match the feature scale. To achieve this it keeps a maximum absolute value seen so far for each feature, https://github.com/VowpalWabbit/vowpal_wabbit/blob/aa88627c9e9ffed6c0eea165ff85b04f0a22c0b7/vowpalwabbit/gd.cc#L507 .

    Finally, you should try --coin, which is a new base learner in vw which can yield superior results with default settings.