Looks like Transformer layers of pytorch give not reproducible outputs. It happens both for cpu and gpu. I know that it sometimes happens because of parallel computations on gpu.
emb = nn.Embedding(10, 12).to(device)
inp1 = torch.LongTensor([1, 2, 3, 4]).to(device)
inp1 = emb(inp1).reshape(inp1.shape[0], 1, 12) #S N E
encoder_layer = nn.TransformerEncoderLayer(d_model=12, nhead=4)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=4)
out1 = transformer_encoder(inp1)
out2 = transformer_encoder(inp1)
out1 and out2 are different. It can be multiprocessing on cpu, but results looks too shaky. How to fix this?
nn.TransformerEncoderLayer
has a default dropout rate of 0.1
. The indices to be dropped will be randomized in every iteration when the model is in training mode.
If you want to train the model with dropout, just ignore this behavior in training and call model.eval()
in testing.
If you want to disable such random behavior in training, set dropout=0
like so
nn.TransformerEncoderLayer(d_model=12, nhead=4, dropout=0)
Full testing script:
import torch
import torch.nn as nn
device = 'cpu'
emb = nn.Embedding(10, 12).to(device)
inp1 = torch.LongTensor([1, 2, 3, 4]).to(device)
inp1 = emb(inp1).reshape(inp1.shape[0], 1, 12) #S N E
encoder_layer = nn.TransformerEncoderLayer(d_model=12, nhead=4, dropout=0).to(device)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=4).to(device)
out1 = transformer_encoder(inp1)
out2 = transformer_encoder(inp1)
print((out1-out2).norm())