I wonder if I want to implement dropout by myself, is something like the following sufficient (taken from Implementing dropout from scratch):
class MyDropout(nn.Module):
def __init__(self, p: float = 0.5):
super(MyDropout, self).__init__()
if p < 0 or p > 1:
raise ValueError("dropout probability has to be between 0 and 1, " "but got {}".format(p))
self.p = p
def forward(self, X):
if self.training:
binomial = torch.distributions.binomial.Binomial(probs=1-self.p)
return X * binomial.sample(X.size()) * (1.0/(1-self.p))
return X
My concern is even if the unwanted weights are masked out (either through this way or by using a mask tensor), there can still be gradient flow through the 0 weights (https://discuss.pytorch.org/t/custom-connections-in-neural-network-layers/3027/9). Is my concern valid?
DropOut does not mask the weights - it masks the features.
For linear layers implementing y = <w, x>
the gradient w.r.t the parameters w
is x
. Therefore, if you set entries in x
to zero - it will amount to no update for the corresponding weight in the adjacent linear layer.