Search code examples
pythontensorflowmachine-learningkerasdeep-learning

Train neural network for Absolute function with minimum Layers


I'm trying to train neural network to learn y = |x| function. As we know the absolute function has 2 different lines connecting with each other at point zero. So I'm trying to have following Sequential model:

Hidden Layer: 2 Dense Layer (activation relu) Output Layer: 1 Dense Layer

after training the model,it only fits the half side of the function. Most of the time it is right hand side, sometimes it is the left side. As soon as I add 1 more Layer in the hidden layer, so instead of 2 I have 3, it perfectly fits the function. Can anyone explain why there is need an extra layer when the absolute function has only one cut ?

Here is the code:

import numpy as np


X = np.linspace(-1000,1000,400)
np.random.shuffle(X)
Y = np.abs(X)

# Reshape data to fit the model input
X = X.reshape(-1, 1)
Y = Y.reshape(-1, 1)

import tensorflow as tf
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Build the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(2, activation='relu'),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mse',metrics=['mae'])
model.fit(X, Y, epochs=1000)
# Predict using the model
Y_pred = model.predict(X)

# Plot the results
plt.scatter(X, Y, color='blue', label='Actual')
plt.scatter(X, Y_pred, color='red', label='Predicted')
plt.title('Actual vs Predicted')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

Plot for 2 Dense Layer:

enter image description here

Plot for 3 Dense Layer: enter image description here


Solution

  • It depends on the weight initialization.

    If both weights of are initilized with positive numbers, the network can only predict positive numbers. For negative numbers, it will always output zero. This will also result no gradients - there is no small step to the weights that would make the output match a bit better.

    So either switch to a different activation function, such as leaky Relu that also passes some signal for the negative values or change the init.

    In the code below I demonstrate it with different custom inits.

    good_init sets one weight to a positive, one to a negative values -> The problem gets solved. both bad_inits set the weights to the same sign, and only half of the domain will be learned.

    import tensorflow as tf
    import tensorflow as tf
    import numpy as np
    import matplotlib.pyplot as plt
    
    X = np.linspace(-1000,1000,400)
    np.random.shuffle(X)
    Y = np.abs(X)
    
    # Reshape data to fit the model input
    X = X.reshape(-1, 1)
    Y = Y.reshape(-1, 1)
    
    from keras import backend as K
    
    def good_init(shape, dtype=None):
        # one positive, one negative weight
        val=np.linspace(-1,1,np.prod(shape)).reshape(shape)
        return K.variable(value=val, dtype=dtype)
    
    def bad_init_right(shape, dtype=None):
        # both weights positive, only right side works
        val=np.linspace(-1,1,np.prod(shape)).reshape(shape)
        val=np.abs(val)
        return K.variable(value=val, dtype=dtype)
    
    
    def bad_init_left(shape, dtype=None):
        # both weights negative, only right side works
        val=np.linspace(-1,1,np.prod(shape)).reshape(shape)
        val=-np.abs(val)
        return K.variable(value=val, dtype=dtype)
    
    # Build the model
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(2, activation='relu', kernel_initializer=bad_init_left),
        tf.keras.layers.Dense(1)
    ])
    
    # Compile the model
    model.compile(optimizer='adam', loss='mse',metrics=['mae'])
    model.fit(X, Y, epochs=100)
    # Predict using the model
    Y_pred = model.predict(X)
    
    # Plot the results
    plt.scatter(X, Y, color='blue', label='Actual')
    plt.scatter(X, Y_pred, color='red', label='Predicted')
    plt.title('Actual vs Predicted')
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.legend()
    plt.show()