python numpy tensorflow keras neural-network

Neural network to approximate the square function gives 0 output

I'm trying to build a neural network to approximate the squares of numbers from -50 to 50. I've referred to the code in this answer to write mine:

import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import regularizers

x_train = np.random.random((10000,1))*100-50
y_train = np.square(x_train)

model = Sequential(
        [
            Dense(8, activation = 'relu', kernel_regularizer = regularizers.l2(0.001), input_shape = (1,)),
            Dense(8, activation = 'relu',  kernel_regularizer = regularizers.l2(0.001)),
            Dense(1, activation = 'relu')
            ]

        )

batch_size = 32
epochs = 100

model.compile(loss = 'mse', optimizer='adam')
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose = 1)

x = "n" 
while True:
    print("enter num:")
    x = input()
    if x == "end":
        break

    X = int(x)

    predicted_sum = model.predict(np.array([X]))
    print(predicted_sum)

The problem is, all inputs give rise to the output "[[0.]]". I don't know why this is caused and how to fix it, can someone help?

This message is displayed immediately after the code is executed, does it have something to do with the problem?

oneDNN custom operations are on. You may see slightly
different numerical results due to floating-point round-off errors from different computation orders. To turn them off
, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

Solution

I get reasonable results by changing the batch size from 32 to 256, changing the number of epochs from 100 to 1000, and removing the ReLU activation function from the last layer.

ReLU has the problem that if the input to the activation is x < 0, then ReLU'(x) = 0. This means that if the output from the final layer happens to be negative, there will be no gradient to correct this.

This is also a problem in previous layers, but it is much more likely that a single neuron gets unlucky in this fashion than it is that eight neurons get unlucky.

As an alternative to removing it, you could also look into leaky ReLU.

Given that the original code trains this for 15,000 epochs, it is not very surprising that 100 epochs gets much worse results. I also changed the batch size to 256 with the intention of making the code run faster.