I am trying to train a neural network for learning purposes using tesorflow.keras
the network should take in a row vector of size 100.
the row vector values are all 0 except only one element having value 1,the neural network should return the position index of element with 1.
if input row vector is [0,0,0,0,1,0...,0,0] then the output should be 5 (starting index is 1).
I have artificially created a training set that contains row vectors such that positions of 1 are within the range 30 to 70. similarly a test/validation set also has been created for the range 10 to 90.
the problem faced is that the model fits only over the training set and is unable to recognize the actual pattern thus having high plateauing validation losses.
theoretically the optimum solution should be a single input layer (of size 100,taking in all row vector values) and a single neuron layer with the weights/kernels and bias as [1,2,3,4....100],[0]. So the question is why is the decent not going towards the optimal weights & bias? and how to overcome it?
tried to increase the neurons,layers and decrease the learning rate
from tensorflow.keras import layers,models,optimizers
from tensorflow.keras.models import load_model
import numpy as np
#creating training set
blank=np.zeros((100,1),dtype='uint8')
x_train,y_train=[],[]
for k in range(0,1):
for i in range(30,71):
img=blank.copy()
img[i]=1
x_train.append(img)
y_train.append(i+1)
x_train,y_train=np.array(x_train),np.array(y_train)
#creating test set
x_test,y_test=[],[]
for i in range(10,91):
img=blank.copy()
img[i]=1
x_test.append(img)
y_test.append(i+1)
x_test,y_test=np.array(x_test),np.array(y_test)
#defining and fitting the neural network
ann = models.Sequential([
layers.Flatten(input_shape=(100,)),
layers.Dense(units=1)])
ann.compile(optimizer=optimizers.Adam(learning_rate=0.1,beta_1=0.9,beta_2=0.999),loss='mean_squared_error')
history=ann.fit(x_train, y_train,epochs=25,batch_size=4,validation_data=(x_test,y_test),verbose=0)
#printing the final outcomes
train_loss,train_error=ann.evaluate(x_train, y_train,verbose=0)
print(f'train set loss {train_loss:.2f} error {train_error:.2f}')
test_loss,test_error=ann.evaluate(x_test, y_test,verbose=0)
print(f'test set loss {test_loss:.2f} error {test_error:.2f}')
your row vector is of size 100 which are inputted into a neural network of 1 neuron in 1 layer and no activation. this is equivalent to a simple and direct linear expression.
y=w1*x1+w2*x2+23*x3...+w100*x100+b
If you notice we have 100 w variables,1 b variable and only 41 training data which can also taken as 41 equations.
Mathematically since we have 101 variables and 41 equations there are infinite solutions to the variable and that is what your neural network is also giving doing.
Based on the initial random values of weights and biases your neural network latches on the closest solution which holds good for your 41 training data.
In conclusion you cannot reach your so called optimum solution with conventional descents