Search code examples
deep-learningneural-networkregression

Can neural network converge to completely random function


I am trying to train DNN that converges to random (i.e., drawn from normal distribution) function but for now the network doesn't learn anything and the loss is stuck. Is is even possible or am I just wasting my time?

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Dense
import numpy as np
import matplotlib.pyplot as plt

n_hidden_units = 25
num_lay = 10
learning_rate = 0.01
batch_size = 1000
epochs = 1000
save_freq_epoches = 500000
num_of_xs  = 2
inputs_train = np.random.randn(batch_size*10,num_of_xs)*1
outputs_train = np.random.randn(batch_size*10,1)#np.sum(inputs_train,axis=1)#

inputs_train = tf.convert_to_tensor(inputs_train)
outputs_train = tf.convert_to_tensor((outputs_train-outputs_train.min())/(outputs_train.max()-outputs_train.min()))

kernel_init = keras.initializers.RandomUniform(-0.25, 0.25)
inputs = Input(num_of_xs)
x = Dense(n_hidden_units, kernel_initializer=kernel_init, activation='relu', )(inputs)
for _ in range(num_lay):
    x = Dense(n_hidden_units,kernel_initializer=kernel_init, activation='relu', )(x)

outputs = Dense(1, kernel_initializer=kernel_init, activation='linear')(x)
model = Model(inputs=inputs, outputs=outputs)
optimizer1 = keras.optimizers.Adam(beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, 
amsgrad=True,learning_rate=learning_rate)

model.compile(loss='mse', optimizer=optimizer1, metrics=None)
model.fit(inputs_train, outputs_train, batch_size=batch_size,epochs=epochs, shuffle=False,
      verbose=2,
      )

plt.plot(outputs_train,'ob')
plt.plot(model(inputs_train),'*r')
plt.show()

For now I am getting the worst predictions (in red) relative to the target labels (blue)

enter image description here


Solution

  • If you are using a validation split, you can't. Otherwise you do, but it will be hard, since good pipelines have regularization techniques that try to prevent this from happening.

    Your target distribution is given by

    np.random.randn(batch_size*10,1)
    

    Then normalized to:

    (outputs_train-outputs_train.min())/(outputs_train.max()-outputs_train.min())
    

    As you can see, your targets are completely independent from your variable x! So, if you have to predict the value (y) for a previously unseen value (x), there is literally nothing you can do better than simply predicting the mean value for y.

    In other words, your target distribution is a flat line y = avg + noise.

    Your question is then: can the network predict this extra noise? Well, no, that's why we call it noise, because it is the random deviations from the pattern that are completely unrelated to the input info that we feed the network.

    BUT.

    If you do NOT use validation (that is, you are interested in the prediction error with respect to the {x, y} pairs that you see during training) then the network will learn noise, up to its full prediction capacity (the more complex the network, the more it can adapt to complex noise). This is precisely what we call overfitting, and it is a BAD thing!

    Normally we want models to predict something like "y = x * 2 + 3", whereas learning noise is more like learning a dictionary of unrelated predictions: "{x1: 2.93432, x2: -0.00324, ...}"

    Because overfitting is bad (it is bad because it makes predictions for unseen validation data worse, which means our models are worse in new data), pipelines have built-in techniques to fight the natural tendency of neural networks to do this. Such techniques include data augmentation (common in images), early stopping, dropout, and so on.

    If you REALLY need to overfit to your data, you will need to deactivate any such techniques, and train for as long as you can (which is normally not something we want to do!).