I am using pandas
to extract my data
. To get an idea of my data
I replicated an example dataset...
data = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
which yields a dataset of shape=(100,4)
...
A B C D
0 75 38 81 58
1 36 92 80 79
2 22 40 19 3
... ...
I am using tflearn
so I will need a target label as well. So I created a target label by extracting one of the columns from data
and then dropped it out of the data
variable (I also converted everything to numpy
arrays)...
# Target label used for training
labels = np.array(data['A'].values, dtype=np.float32)
# Reshape target label from (100,) to (100, 1)
labels = np.reshape(labels, (-1, 1))
# Data for training minus the target label.
data = np.array(data.drop('A', axis=1).values, dtype=np.float32)
Then I take the data
and the labels
and feed it into the DNN...
# Deep Neural Network.
net = tflearn.input_data(shape=[None, 3])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 1, activation='softmax')
net = tflearn.regression(net)
# Define model.
model = tflearn.DNN(net)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)
This seems like it should work, but the output I get is as follows...
Notice that the loss
remains at 0
, so I am definitely doing something wrong. I don't really know what form my data should be in. How can I get my training to work?
Your actual output is in range 0 to 100 while the activation softmax in the outermost layer outputs in range [0, 1]. You need to fix that. Also the default loss for tflearn.regression is categorical cross entropy which is used for classification problems and makes no sense in your scenario. You should try L2 loss. The reason you are getting zero error in this setting is that your network predicts 0 for all training examples and if you fit that value in formula for sigmoid cross entropy, loss indeed is zero. Here is its formula , where t[i] denotes the actual probabilities (which doesnt make sense in your problem) and o[i] is the predicted probabilities.
Here is more reasoning about why default choice of loss function is not suitable for your case