Search code examples
pythontensorflownormalization

Implementing Normalization Inside Tensorflow Model


I'm currently playing around with a basic LSTM-based Autoencoder using the Tensorflow library. The goal is for the Autoencoder to reconstruct multivariate time-series. I'm interested in moving the feature-wise normalization of the data from the data pipeline to inside the model.

Currently I normalize data the following way:

normalizer = Normalization(axis=-1)
normalizer.adapt(data_train)
data_train = normalizer(data_train)

inputs = Input(shape=[None, n_inputs])
x = LSTM(4, return_sequences=True)(inputs)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(n_inputs)))(x)
model = Model(inputs, x)

Which works as intended, leads to respectable loss (~1e-2) but is outside the model. According to the documentation (under "Preprocessing data before the model or inside the model"), the following code should be equivalent to the above snippet, except that it runs inside the model:

normalizer = Normalization(axis=-1)
normalizer.adapt(data_train)

inputs = Input(shape=[None, n_inputs])
x = normalizer(inputs)
x = LSTM(4, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(n_inputs)))(x)
model = Model(inputs, x)

However, running the latter variant leads to astronomical loss values (~1e3) and also worse results in testing. Hence my question is: what am I doing wrong? Could it be that I'm misunderstanding the documentation?

Any advice greatly appreciated!


Solution

  • The two methods seem to give consistent results as long as the normalizer is applied only to the inputs (i.e. to the feature matrix) when used outside the model:

    import numpy as np
    from tensorflow.keras import Input
    from tensorflow.keras.layers import Dense, LSTM, TimeDistributed
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers.experimental.preprocessing import Normalization
    np.random.seed(42)
    
    # define the input parameters
    num_samples = 100
    time_steps = 10
    train_size = 0.8
    
    # generate the data
    X = np.random.normal(loc=10, scale=5, size=(num_samples, time_steps, 1))
    y = np.mean(X, axis=1) + np.random.normal(loc=0, scale=1, size=(num_samples, 1))
    
    # split the data
    X_train, X_test = X[:np.int(train_size * X.shape[0]), :], X[np.int(train_size * X.shape[0]):, :]
    y_train, y_test = y[:np.int(train_size * y.shape[0]), :], y[np.int(train_size * y.shape[0]):, :]
    
    # normalize the inputs inside the model
    normalizer = Normalization()
    normalizer.adapt(X_train)
    
    inputs = Input(shape=[None, 1])
    x = normalizer(inputs)
    x = LSTM(4, return_sequences=True)(x)
    x = LSTM(2, return_sequences=True)(x)
    x = LSTM(2, return_sequences=True)(x)
    x = LSTM(4, return_sequences=True)(x)
    x = TimeDistributed((Dense(1)))(x)
    model = Model(inputs, x)
    
    model.compile(loss='mae', optimizer='adam')
    model.fit(X_train, y_train, batch_size=32, epochs=10, verbose=0)
    
    print(model.evaluate(X_test, y_test))
    # 10.704551696777344
    
    # normalize the inputs outside the model
    normalizer = Normalization()
    normalizer.adapt(X_train)
    
    X_train_normalized = normalizer(X_train)
    X_test_normalized = normalizer(X_test)
    
    inputs = Input(shape=[None, 1])
    x = LSTM(4, return_sequences=True)(inputs)
    x = LSTM(2, return_sequences=True)(x)
    x = LSTM(2, return_sequences=True)(x)
    x = LSTM(4, return_sequences=True)(x)
    x = TimeDistributed((Dense(1)))(x)
    model = Model(inputs, x)
    
    model.compile(loss='mae', optimizer='adam')
    model.fit(X_train_normalized, y_train, batch_size=32, epochs=10, verbose=0)
    
    print(model.evaluate(X_test_normalized, y_test))
    # 10.748750686645508