Search code examples
pythonmachine-learningtensorflowtflearn

Python TFlearn - Loss too high


After fixing my problem of shape of input I ran my program, the problem is that the total loss printed by the program is way too high (if I compare it for example to the one from the quickstart tutorial).

My goal is to predict the congestion of future entry by using past data (I have more than 10M of entry with the score tagged on) so I shouldn't have problem with training.

Here is my code:

import numpy as np
import tflearn

# Load CSV file, indicate that the first column represents labels
from tflearn.data_utils import load_csv
data, labels = load_csv('nowcastScaled.csv', has_header=True, n_classes=2)

# Preprocessing function
def preprocess(data):
    return np.array(data, dtype=np.float32)

# Preprocess data
data = preprocess(data)

# Build neural network
net = tflearn.input_data(shape=[None, 2])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='linear')
data = np.reshape(data, (-1, 2))
labels = np.reshape(labels, (-1, 2))
net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
                         loss='categorical_crossentropy')

# Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(data, labels, n_epoch=15, batch_size=16, show_metric=True)

# Training
model.save('test_Model')
model.load('test_Model')
score = model.evaluate(data, labels, batch_size=16)

My excel file have this kind of look (2 column, 100 000 lignes)

calculed_at , congestion
1 , 56
2 , 21

This is what the results look like (15 epoch):

Training samples: 50000
Validation samples: 0
....
--
Training Step: 40625  | total loss: 15.27961 | time: 17.659s
| Adam | epoch: 013 | loss: 15.27961 - acc: 0.7070 -- iter: 50000/50000
--
Training Step: 43750  | total loss: 15.66268 | time: 17.549s
| Adam | epoch: 014 | loss: 15.66268 - acc: 0.7247 -- iter: 50000/50000
--
Training Step: 46875  | total loss: 15.94696 | time: 18.037s
| Adam | epoch: 015 | loss: 15.94696 - acc: 0.7581 -- iter: 50000/50000
--

Have you an idea about what could cause such high loss? It seem strange since the accuracy printed doesn't seem too bad. Thank you for your help.

Edit: It seemed to be a good moment when I took theses values since when I tried just now I had total loss exceeding 280 (and an accuracy below 0,3 or barely above).


Solution

  • For time series, you construct the input/output samples by considering frames of time window. In each window if the samples are {0, 1, ...N} select the first N-1 samples as input and the last sample as output. Then you can do regression to do time prediction.