Search code examples

How to translate the neural network of MLP from tensorflow to pytorch

I have built up an MLP neural network using 'Tensorflow', which is stated as follow:

model_mlp.add(Dense(units=35, input_dim=train_X.shape[1], kernel_initializer='normal', activation='relu'))
model_mlp.add(Dense(units=86, kernel_initializer='normal', activation='relu'))
model_mlp.add(Dense(units=86, kernel_initializer='normal', activation='relu'))
model_mlp.add(Dense(units=10, kernel_initializer='normal', activation='relu'))

I want to convert the above MLP code using pytorch. How to do it? I try to do it as follows:

    class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(train_X.shape[1],35)
        self.fc2 = nn.Linear(35, 86)
        self.fc3 = nn.Linear(86, 86)
        self.fc4 = nn.Linear(86, 10)
        self.fc5 = nn.Linear(10, 1)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = self.fc5(x)
        return x
    def predict(self, x_test):
        x_test = torch.from_numpy(x_test).float()
        x_test = self.forward(x_test)
        return x_test.view(-1).data.numpy()
model = MLP()

I use the same dataset but the two codes give two different answers. Code written in Tensorflow always produce a much better results than using the code written in Pytorch. I wonder if my code in pytorch is not correct. In case my written code in PyTorch is correct, I wonder how to explain the differences. I am looking forward to any replies.


  • Welcome to pytorch!

    I guess the problem is with the initialization of your network. That is how I would do it:

    def init_weights(m):
        if type(m) == nn.Linear:
            torch.nn.init.xavier_normal(m.weight)  # initialize with xaver normal (called gorot in tensorflow)
   # initialize bias with a constant
    class MLP(nn.Module):
        def __init__(self, input_dim):
            super(MLP, self).__init__()
            self.mlp = nn.Sequential(nn.Linear(input_dim ,35), nn.ReLU(),
                                     nn.Linear(35, 86), nn.ReLU(),
                                     nn.Linear(86, 86), nn.ReLU(), 
                                     nn.Linear(86, 10), nn.ReLU(),
                                     nn.Linear(10, 1), nn.ReLU())
        def forward(self, x):
            y =self.mlp(x)
            return y
    model = MLP(input_dim)
    optimizer = Adam(model.parameters())
    loss_func = BCEWithLogistLoss()
    # training loop
    for data, label in dataloader:
        pred = model(data)
        loss = loss_func(pred, lable)

    Notice that in pytorch we do not call model.forward(x), but model(x). That is because nn.Module applies hooks in .__call__() that are used in the backward pass.

    You can check the documentation of weight initialization here: