In my pursuit of mastering PyTorch neural networks, I've attempted to replicate an existing TensorFlow architecture. However, I've encountered a significant performance gap. While TensorFlow achieves rapid learning within 25 epochs, PyTorch requires at least 250 epochs for comparable generalization. Despite meticulous code scrutiny, I've been unable to identify further enhancements. Despite carefully aligning the architectures of both neural networks, disparities still persist. Can anyone shed light on what else might be amiss here?
In the subsequent section, I'll present the full Python code for both implementations, along with the CLI output and graphical visualization.
Reproducibility: As I prefer not to share the original dataset, I've attached a piece of code that emulates the dataset instead. The generated data_inverter.csv
can be used to reproduce the observed behavior.
PyTorch code:
# Standard library imports
import pandas as pd
import matplotlib.pyplot as plt
# External library imports
import torch
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler, StandardScaler
from sklearn.metrics import max_error, mean_absolute_error, mean_squared_error
# Loading dataset
df_data = pd.read_csv("./data_inverter.csv", names=["pvt", "edge", "slew", "load", "delay"])
# Selecting subset of data based on specific conditions
df_select = df_data[(df_data["pvt"] == "PtypV1500T027") & (df_data["edge"] == "rise")]
# Splitting features and target variable
X = df_select.drop(["pvt", "edge", "delay"], axis='columns')
y = df_select["delay"]
# Scaling input features using Min-Max scaling
slew_scaler = MinMaxScaler()
load_scaler = MinMaxScaler()
X_scaled = X.copy()
X_scaled["slew"] = slew_scaler.fit_transform(X_scaled.slew.values.reshape(-1, 1))
X_scaled["load"] = load_scaler.fit_transform(X_scaled.load.values.reshape(-1, 1))
# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.1, random_state=42)
# Converting data to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train.values)
y_train_tensor = torch.FloatTensor(y_train.values).view(-1, 1)
X_test_tensor = torch.FloatTensor(X_test.values)
y_test_tensor = torch.FloatTensor(y_test.values).view(-1, 1)
# Setting random seed for reproducibility
# Defining neural network architecture
model = torch.nn.Sequential(
torch.nn.Linear(X_train_tensor.shape[1], 128),
torch.nn.Linear(128, 128),
torch.nn.Linear(128, 64),
torch.nn.Linear(64, 32),
torch.nn.Linear(32, 16),
torch.nn.Linear(16, 1),
# Loss function and optimizer
criterion = torch.nn.MSELoss()
criterion_val = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
# Training the model
num_epochs = 25
progress = {'loss': [], 'mae': [], 'mse': [], 'val_loss': [], 'val_mae': [], 'val_mse': []}
for epoch in range(num_epochs):
# Forward pass
y_predict = model(X_train_tensor)
loss = criterion(y_predict, y_train_tensor)
# Backward and optimize
# Validation
with torch.no_grad():
y_test_predict = model(X_test_tensor)
loss_val = criterion_val(y_test_predict, y_test_tensor)
# Record progress
progress['mae'].append(mean_absolute_error(y_train_tensor, y_predict.detach().numpy()))
progress['mse'].append(mean_squared_error(y_train_tensor, y_predict.detach().numpy()))
progress['val_mae'].append(mean_absolute_error(y_test_tensor, y_test_predict.detach().numpy()))
progress['val_mse'].append(mean_squared_error(y_test_tensor, y_test_predict.detach().numpy()))
print("Epoch %i/%i - loss: %0.5F" % (epoch, num_epochs, loss.item()))
# Displaying model summary
# Plotting training progress
df_progress = pd.DataFrame(progress)
plt.title("Model training progress: DNN PyTorch")
# Making predictions on the testing set
with torch.no_grad():
y_predict_tensor = model(X_test_tensor)
y_predict = y_predict_tensor.numpy()
# Displaying model performance metrics
print("Model performance metrics: DNN PyTorch")
print("MAX error:", max_error(y_test_tensor, y_predict))
print("MAE error:", mean_absolute_error(y_test_tensor, y_predict))
print("MSE error:", mean_squared_error(y_test_tensor, y_predict, squared=False))
plt.scatter(y_test, y_predict)
plt.scatter(y_test, y_test, marker='.')
plt.title("Model predictions: DNN PyTorch")
TensorFlow code:
# Standard library imports
import pandas as pd
import matplotlib.pyplot as plt
# External library imports
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler, StandardScaler
from sklearn.metrics import max_error, mean_absolute_error, mean_squared_error
# Loading dataset
df_data = pd.read_csv("./data_inverter.csv", names=["pvt", "edge", "slew", "load", "delay"])
# Selecting subset of data based on specific conditions
df_select = df_data[(df_data["pvt"] == "PtypV1500T027") & (df_data["edge"] == "rise")]
# Splitting features and target variable
X = df_select.drop(["pvt", "edge", "delay"], axis='columns')
y = df_select["delay"]
# Scaling input features using Min-Max scaling
slew_scaler = MinMaxScaler()
load_scaler = MinMaxScaler()
X_scaled = X.copy()
X_scaled["slew"] = slew_scaler.fit_transform(X_scaled.slew.values.reshape(-1, 1))
X_scaled["load"] = load_scaler.fit_transform(X_scaled.load.values.reshape(-1, 1))
# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.1, random_state=42)
# Converting data to TensorFlow tensors
X_train_tensor = tf.constant(X_train.values, dtype=tf.float32)
y_train_tensor = tf.constant(y_train.values, dtype=tf.float32)
X_test_tensor = tf.constant(X_test.values, dtype=tf.float32)
y_test_tensor = tf.constant(y_test.values, dtype=tf.float32)
# Setting random seed for reproducibility
# Defining neural network architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_dim=X_train_tensor.shape[1]),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(1, activation='elu')
# Compiling the model
loss=tf.keras.losses.MeanSquaredError(), # Using Mean Squared Error loss function
optimizer=tf.keras.optimizers.Adam(), # Using Adam optimizer
metrics=['mae', 'mse'] # Using Mean Absolute Error and Mean Squared Error as metrics
# Training the model
progress =, y_train_tensor, validation_data=(X_test_tensor, y_test_tensor), epochs=25)
# Evaluating model performance on the testing set
model.evaluate(X_test_tensor, y_test_tensor, verbose=2)
# Displaying model summary
# Plotting training progress
plt.title("Model training progress: DNN TensorFlow")
# Making predictions on the testing set
y_predict = model.predict(X_test_tensor)
# Displaying model performance metrics
print("Model performance metrics: DNN TensorFlow")
print("MAX error:", max_error(y_test_tensor, y_predict))
print("MAE error:", mean_absolute_error(y_test_tensor, y_predict))
print("MSE error:", mean_squared_error(y_test_tensor, y_predict, squared=False))
plt.scatter(y_test, y_predict)
plt.scatter(y_test, y_test, marker='.')
plt.title("Model predictions: DNN TensorFlow")
CLI output of PyTorch model performance metrics after 25 epochs:
(0): Linear(in_features=2, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=128, bias=True)
(3): ReLU()
(4): Linear(in_features=128, out_features=64, bias=True)
(5): ReLU()
(6): Linear(in_features=64, out_features=32, bias=True)
(7): ReLU()
(8): Linear(in_features=32, out_features=16, bias=True)
(9): ReLU()
(10): Linear(in_features=16, out_features=1, bias=True)
(11): ELU(alpha=1.0)
Model performance metrics: DNN PyTorch
MAX error: 1.2864852
MAE error: 0.3353702
MSE error: 0.42874745
CLI output of TensorFlow model performance metrics after 25 epochs:
Model: "sequential"
Layer (type) Output Shape Param #
dense (Dense) (None, 128) 384
dense_1 (Dense) (None, 128) 16512
dense_2 (Dense) (None, 64) 8256
dense_3 (Dense) (None, 32) 2080
dense_4 (Dense) (None, 16) 528
dense_5 (Dense) (None, 1) 17
Total params: 27777 (108.50 KB)
Trainable params: 27777 (108.50 KB)
Non-trainable params: 0 (0.00 Byte)
6/6 [==============================] - 0s 750us/step
Model performance metrics: DNN TensorFlow
MAX error: 0.013849139
MAE error: 0.0029576812
MSE error: 0.0036013061
PyTorch scatter plot (orange = target against itself, blue = target against prediction):
TensorFlow scatter plot (orange = target against itself, blue = target against prediction):
Appending additional info (reaction to the questions and comments):
- the default learning rate is set to 0.001.
- the default learning rate is set to 0.001
Here's the PyTorch model performance after 250 epoch:
(0): Linear(in_features=2, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=128, bias=True)
(3): ReLU()
(4): Linear(in_features=128, out_features=64, bias=True)
(5): ReLU()
(6): Linear(in_features=64, out_features=32, bias=True)
(7): ReLU()
(8): Linear(in_features=32, out_features=16, bias=True)
(9): ReLU()
(10): Linear(in_features=16, out_features=1, bias=True)
(11): ELU(alpha=1.0)
Model performance metrics: DNN PyTorch
MAX error: 0.025619686
MAE error: 0.006687804
MSE error: 0.008531998
If you want to run reproduce the issue, you can use this code to emulate the dataset:
import csv
import math
x_values = [0.003, 0.00354604, 0.00546274, 0.00912297, 0.0148254, 0.0228266, 0.0333551, 0.0466191, 0.0628111, 0.0821111, 0.104689, 0.130705, 0.160313, 0.193659, 0.230886, 0.272128, 0.317517, 0.36718, 0.42124, 0.479818, 0.54303, 0.61099, 0.683809, 0.761595, 0.844455, 0.932492, 1.02581, 1.1245, 1.22868, 1.33842, 1.45383, 1.57501, 1.70203, 1.835, 1.974]
y_values = [0.001, 0.00102008, 0.00109058, 0.0012252, 0.00143494, 0.00172922, 0.00211646, 0.0026043, 0.00319984, 0.0039097, 0.0047401, 0.00569697, 0.00678594, 0.00801243, 0.00938161, 0.0108985, 0.0125679, 0.0143945, 0.0163828, 0.0185373, 0.0208622, 0.0233618, 0.0260401, 0.028901, 0.0319486, 0.0351866, 0.0386187, 0.0422487, 0.0460802, 0.0501166, 0.0543615, 0.0588182, 0.0634902, 0.0683808, 0.0734931, 0.0788305, 0.0843961, 0.0901929, 0.0962242, 0.102493, 0.109002, 0.115755, 0.122753, 0.130001, 0.137502, 0.145257, 0.153269, 0.161543, 0.170079, 0.178881]
z_values = [[math.sqrt(5*(x+0.25)) * math.sqrt(3*(y+0.005)) for y in y_values] for x in x_values]
with open("./data_inverter.csv", 'w') as fid:
writer = csv.writer(fid)
for i in range(len(x_values)):
for j in range(len(y_values)):
writer.writerow(["PtypV1500T027", "rise", x_values[i], y_values[j], z_values[i][j]])
The difference is that TensorFlow's
default to mini-batching* (with a batch size of 32, see the doc of
), while your PyTorch training loop is simply batching*. As a result, your PyTorch model is doing only 25 weights update, while the TensorFlow model does (N/32)*25
(where N
is your number of sample), hence being able to find a better local minima.
By implementing mini-batching, you get similar results in Pytorch:
batch_size = 32
for epoch in range(num_epochs):
# Forward pass
batches = list()
# mini-batching
for x_batch, y_true in zip(
torch.split(X_train_tensor, batch_size, dim=0),
torch.split(y_train_tensor, batch_size, dim=0),
y_predict_batch = model(x_batch)
loss = criterion(y_predict_batch, y_true)
# Backward and optimize
y_predict = torch.concat(batches, dim=0)
# Validation
with torch.no_grad():
y_test_predict = model(X_test_tensor)
print(y_test_predict.shape, y_test_tensor.shape)
loss_val = criterion_val(y_test_predict, y_test_tensor)
# Record progress
progress['mae'].append(mean_absolute_error(y_train_tensor, y_predict.detach().numpy()))
progress['mse'].append(mean_squared_error(y_train_tensor, y_predict.detach().numpy()))
progress['val_mae'].append(mean_absolute_error(y_test_tensor, y_test_predict.detach().numpy()))
progress['val_mse'].append(mean_squared_error(y_test_tensor, y_test_predict.detach().numpy()))
print("Epoch %i/%i - loss: %0.5F" % (epoch, num_epochs, loss.item()))
I would suggest to use the
module to do the mini-batching rather than my implementation.
*: for the difference between batching and mini-batching, see this question: What is the meaning of a 'mini-batch' in deep learning?
You could theoretically compensate the larger batch size by using a larger learning rate in PyTorch. I get not completely terrible results with a learning rate of 0.02, but:
With a bit of tuning (like using SDG and a scheduler), you could probably get better results, but mini-batching is just much easier in that case.