Implementing Normalization Inside Tensorflow Model

I'm currently playing around with a basic LSTM-based Autoencoder using the Tensorflow library. The goal is for the Autoencoder to reconstruct multivariate time-series. I'm interested in moving the feature-wise normalization of the data from the data pipeline to inside the model.

Currently I normalize data the following way:

normalizer = Normalization(axis=-1)
normalizer.adapt(data_train)
data_train = normalizer(data_train)

inputs = Input(shape=[None, n_inputs])
x = LSTM(4, return_sequences=True)(inputs)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(n_inputs)))(x)
model = Model(inputs, x)

Which works as intended, leads to respectable loss (~1e-2) but is outside the model. According to the documentation (under "Preprocessing data before the model or inside the model"), the following code should be equivalent to the above snippet, except that it runs inside the model:

normalizer = Normalization(axis=-1)
normalizer.adapt(data_train)

inputs = Input(shape=[None, n_inputs])
x = normalizer(inputs)
x = LSTM(4, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(n_inputs)))(x)
model = Model(inputs, x)

However, running the latter variant leads to astronomical loss values (~1e3) and also worse results in testing. Hence my question is: what am I doing wrong? Could it be that I'm misunderstanding the documentation?

Any advice greatly appreciated!

Solution

The two methods seem to give consistent results as long as the normalizer is applied only to the inputs (i.e. to the feature matrix) when used outside the model:

import numpy as np
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense, LSTM, TimeDistributed
from tensorflow.keras.models import Model
from tensorflow.keras.layers.experimental.preprocessing import Normalization
np.random.seed(42)

# define the input parameters
num_samples = 100
time_steps = 10
train_size = 0.8

# generate the data
X = np.random.normal(loc=10, scale=5, size=(num_samples, time_steps, 1))
y = np.mean(X, axis=1) + np.random.normal(loc=0, scale=1, size=(num_samples, 1))

# split the data
X_train, X_test = X[:np.int(train_size * X.shape[0]), :], X[np.int(train_size * X.shape[0]):, :]
y_train, y_test = y[:np.int(train_size * y.shape[0]), :], y[np.int(train_size * y.shape[0]):, :]

# normalize the inputs inside the model
normalizer = Normalization()
normalizer.adapt(X_train)

inputs = Input(shape=[None, 1])
x = normalizer(inputs)
x = LSTM(4, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(1)))(x)
model = Model(inputs, x)

model.compile(loss='mae', optimizer='adam')
model.fit(X_train, y_train, batch_size=32, epochs=10, verbose=0)

print(model.evaluate(X_test, y_test))
# 10.704551696777344

# normalize the inputs outside the model
normalizer = Normalization()
normalizer.adapt(X_train)

X_train_normalized = normalizer(X_train)
X_test_normalized = normalizer(X_test)

inputs = Input(shape=[None, 1])
x = LSTM(4, return_sequences=True)(inputs)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(1)))(x)
model = Model(inputs, x)

model.compile(loss='mae', optimizer='adam')
model.fit(X_train_normalized, y_train, batch_size=32, epochs=10, verbose=0)

print(model.evaluate(X_test_normalized, y_test))
# 10.748750686645508