This code is from PyTorch transformer:
self.linear1 = Linear(d_model, dim_feedforward, **factory_kwargs)
self.dropout = Dropout(dropout)
self.linear2 = Linear(dim_feedforward, d_model, **factory_kwargs)
self.norm1 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
self.norm2 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
self.norm3 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
self.dropout1 = Dropout(dropout)
self.dropout2 = Dropout(dropout)
self.dropout3 = Dropout(dropout)
Why do they add self.dropout1
, ...2
, ...3
when self.dropout
already exists and is the exact same function?
Also, what is the difference between (self.linear1
, self.linear2
) and self.linear
?
That's because to separate one Linear layer or Dropout layer from one another. That's very simple logic. You are creating different instances or layers in the network of the Dropout function using self.dropout = Dropout(dropout)
.