Search code examples
machine-learningartificial-intelligencereinforcement-learninggradient

Why is RMSProp considered "leaky"?


decay_rate = 0.99 # decay factor for RMSProp leaky sum of grad^2

I'm perplexed by the wording of comments like the above where they talk about a "leaky" sum of squares for the RMSProp optimizer. So far I've been able to uncover that this particular line is copy-pasta'd from Andrej Karpathy's Deep Reinforcement Learning: Pong from Pixels, and that RMSProp is an unpublished optimizer proposed by Hinton in one of his Coursera Classes. Looking at the math for RMSProp from link 2, it's hard to figure out how any of this is "leaky."

Would anyone happen to know why RMSProp is described this way?


Solution

  • RMsprop keeps the exponentialy decaying average of squared gradients. Wording (however unfortunate) of "leaky" refers to the fact how much of the previous estimate "leaks" to the current one, since

    E[g^2]_t := 0.99 E[g^2]_{t-1} + 0.01 g^2_t
                \_______________/   \________/
                   "leaking"         new data