Search code examples
neural-networkbackpropagation

Momentum in neural networks


Neural networks and momentum

Should the momentum factor preferably relate to [both the dataset instance and the individual weights] or [just the weights]. Eg:

def get_momentum( instance, weight ):
   return float

instance1 = vector 1xn
instance2 = vector 1xn
weights   = vector 1xn

# Option 1
get_momentum( instance1, weights[0] ) # eg returns 0.1
get_momentum( instance2, weights[0] ) # eg returns 0.3 <-- same weight, different momentum

# Option 2
get_momentum( instance1, weights[0] ) # eg returns 0.1
get_momentum( instance2, weights[0] ) # eg returns 0.1

The second alternative would have lower memory complexity. I believe it would also cause the learning algorithm to be more likely to get stuck in local optima than the first alternative. Option 1 should cause a stronger momentum "pull".


Solution

  • Tested

    I've done some testing of my hypothesis. The two approaches appears to perform almost the same, but there is an evident improvement by using the first alternative.

    Memory complexity of the momentum data structure:

    • Approach 1: O( instances * weights )
    • Approach 2: O( weights )

    Result:

    Each round uses a predefined weight set. Both versions were trained on the same weight set.

    $ pypy backprop.py # First approach
    Round: 1/10     Required epochs: 40995
    Round: 2/10     Required epochs: 40997
    Round: 3/10     Required epochs: 40996
    Round: 4/10     Required epochs: 40997
    Round: 5/10     Required epochs: 40997
    Round: 6/10     Required epochs: 40997
    Round: 7/10     Required epochs: 40999
    Round: 8/10     Required epochs: 40996
    Round: 9/10     Required epochs: 40996
    Round: 10/10    Required epochs: 40997
    
    $ pypy backprop.py # Second approach
    Round: 1/10     Required epochs: 41070
    Round: 2/10     Required epochs: 41072
    Round: 3/10     Required epochs: 41069
    Round: 4/10     Required epochs: 41069
    Round: 5/10     Required epochs: 41070
    Round: 6/10     Required epochs: 41071
    Round: 7/10     Required epochs: 41072
    Round: 8/10     Required epochs: 41069
    Round: 9/10     Required epochs: 41070
    Round: 10/10    Required epochs: 41071
    

    As we may read from the tests, the second approach (which has lower memory complexity) requires a few more epochs of training before reaching the required precision.

    Conclusion

    The increased memory complexity might not be a worthy sacrifice in comparison to the minor training improvement.