I am new to Tensorflow and trying to build a Neural Network model in Tensorflow to solve Task Scheduling Problem.
I built the model with 2 hidden layers, 36 nodes in the input layer, and 22 nodes in the output layer. All the values in the nodes(in both input and output layer) are normalized floating point numbers(values between 0.0 and 1.0). I followed the example online to build the model as I need to import data from csv file: http://tneal.org/post/tensorflow-iris/TensorFlowIris/
I was initially using 9 samples of data to train the network and got overfitting results, so I increased the number of samples to 1000, but the result became weird, and it is not even overfitting anymore(when the same data set are used for both training and testing, the prediction and actual values for output are not the same).
When I adjusted the value for the learning rate, the predicted results were changed, and I had even got some negative or very large values. I had also tried to change the optimizer, number of nodes in the hidden layer, the cost function, but still didn't get any improvements.
Here is the script I had wrote in python:
import csv
import tensorflow as tf
import numpy as np
import pandas as pd
resource_file = "testGraphs/testgraph_input_output_CCR_1.0_Norm.csv"
respd = pd.read_csv(resource_file)
#print(respd.head())
n_nodes = 12
n_nodes_hl1 = 30
n_nodes_hl2 = 25
n_classes = n_nodes*2-2
#batch_size = 100
shuffled_res = respd.sample(frac = 1)
trainSet_res = shuffled_res[0:len(shuffled_res)]
testSet_res = shuffled_res[len(shuffled_res)-2:]
x = tf.placeholder('float32',[None,n_nodes*3])
y = tf.placeholder('float32',[None,n_classes])
def nerual_network_model(data):
hidden_1_layer = {'weights':tf.Variable(tf.random_normal([n_nodes*3,n_nodes_hl1])), 'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}
hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1,n_nodes_hl2])), 'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}
output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2,n_classes])), 'biases':tf.Variable(tf.random_normal([n_classes]))}
#input_data * weights + biases
l1 = tf.add(tf.matmul(data,hidden_1_layer['weights']),hidden_1_layer['biases'])
l1 = tf.nn.relu(l1)
l2 = tf.add(tf.matmul(l1,hidden_2_layer['weights']),hidden_2_layer['biases'])
l2 = tf.nn.relu(l2)
output = tf.matmul(l2,output_layer['weights'])+output_layer['biases']
return output
def train_nerual_network(x):
prediction = nerual_network_model(x)
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction,y))
cost = tf.reduce_mean(tf.square(prediction-y))
#cost = tf.pow(prediction-y,2)
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
input_labels = ['In0','Weight0','Out0','In1','Weight1','Out1','In2','Weight2','Out2','In3','Weight3','Out3','In4','Weight4','Out4','In5','Weight5','Out5','In6','Weight6','Out6','In7','Weight7','Out7','In8','Weight8','Out8','In9','Weight9','Out9','In10','Weight10','Out10','In11','Weight11','Out11']
output_labels = ['ProcessorForNode1','StartingTime1','ProcessorForNode2','StartingTime2','ProcessorForNode3','StartingTime3','ProcessorForNode4','StartingTime4','ProcessorForNode5','StartingTime5','ProcessorForNode6','StartingTime6','ProcessorForNode7','StartingTime7','ProcessorForNode8','StartingTime8','ProcessorForNode9','StartingTime9','ProcessorForNode10','StartingTime10','ProcessorForNode11','StartingTime11']
for i in range(1000):
train_res = trainSet_res.sample(100)
sess.run(optimizer,feed_dict={x: [j for j in train_res[input_labels].values],
y:[j for j in train_res[output_labels].values]})
#correct = tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
# accuracy = tf.reduce_mean(tf.cast(correct,'float32'))
#print sess.run(accuracy, feed_dict={x: [j for j in testSet_res[input_labels].values],
# y:[j for j in testSet_res[output_labels].values]})
print sess.run(prediction, feed_dict={x: [j for j in testSet_res[input_labels].values],
y:[j for j in testSet_res[output_labels].values]})
print sess.run(y, feed_dict={x: [j for j in testSet_res[input_labels].values],
y:[j for j in testSet_res[output_labels].values]})
Here is the result: Prediction values above and actual values below
Can someone tell me what maybe the cause of the problems in this model? Thank you.
This is a very broad question, your problem could be related to a lot of things (data, optimization not tuned and/or did not converge, model design insufficient, bugs in the implementation, etc. ...)
A possibility is to take a step back and implement the simplest model you can think of that would still solve the problem. E.g. if you have a regression problem (which I am not sure you do - the 'StartingTimeX'
might be a regression problem, but the 'ProcessorForNodeX'
looks more like a classification problem), you could start off with a simple linear regression model. If this model gives you results that are good enough for your application, there's nothing more to do. Generally in Machine Learning, the simplest model to solve the task at hand is the one you should aim for (Occam's razor).
If the simple model is not sufficient enough (e.g. because it does not generalize well enough), you can think about how to improve the results, e.g. by using more sophisticated models.