I'm stuck from several days ago with the problem I'm going to describe. I'm following the Daniel Nouri's tutorial about deep learning: http://danielnouri.org/notes/category/deep-learning/ and I tried to adapt his example to a classification dataset. My problem here is that if I treat the dataset as a regression problem, it works properly, but if I try to perform a classification, it fails. I tried to write 2 reproducible examples.
1) Regression (it works well)
import lasagne
from sklearn import datasets
import numpy as np
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from sklearn.preprocessing import StandardScaler
iris = datasets.load_iris()
X = iris.data[iris.target<2] # we only take the first two features.
Y = iris.target[iris.target<2]
stdscaler = StandardScaler(copy=True, with_mean=True, with_std=True)
X = stdscaler.fit_transform(X).astype(np.float32)
y = np.asmatrix((Y-0.5)*2).T.astype(np.float32)
print X.shape, type(X)
print y.shape, type(y)
net1 = NeuralNet(
layers=[ # three layers: one hidden layer
('input', layers.InputLayer),
('hidden', layers.DenseLayer),
('output', layers.DenseLayer),
],
# layer parameters:
input_shape=(None, 4), # 96x96 input pixels per batch
hidden_num_units=10, # number of units in hidden layer
output_nonlinearity=None, # output layer uses identity function
output_num_units=1, # 1 target value
# optimization method:
update=nesterov_momentum,
update_learning_rate=0.01,
update_momentum=0.9,
regression=True, # flag to indicate we're dealing with regression problem
max_epochs=400, # we want to train this many epochs
verbose=1,
)
net1.fit(X, y)
2) Classification (it raises an error of matrix dimensionalities; I paste it below)
import lasagne
from sklearn import datasets
import numpy as np
from lasagne import layers
from lasagne.nonlinearities import softmax
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from sklearn.preprocessing import StandardScaler
iris = datasets.load_iris()
X = iris.data[iris.target<2] # we only take the first two features.
Y = iris.target[iris.target<2]
stdscaler = StandardScaler(copy=True, with_mean=True, with_std=True)
X = stdscaler.fit_transform(X).astype(np.float32)
y = np.asmatrix((Y-0.5)*2).T.astype(np.int32)
print X.shape, type(X)
print y.shape, type(y)
net1 = NeuralNet(
layers=[ # three layers: one hidden layer
('input', layers.InputLayer),
('hidden', layers.DenseLayer),
('output', layers.DenseLayer),
],
# layer parameters:
input_shape=(None, 4), # 96x96 input pixels per batch
hidden_num_units=10, # number of units in hidden layer
output_nonlinearity=softmax, # output layer uses identity function
output_num_units=1, # 1 target value
# optimization method:
update=nesterov_momentum,
update_learning_rate=0.01,
update_momentum=0.9,
regression=False, # flag to indicate we're dealing with classification problem
max_epochs=400, # we want to train this many epochs
verbose=1,
)
net1.fit(X, y)
The failed output I get with the code 2.
(100, 4) <type 'numpy.ndarray'>
(100, 1) <type 'numpy.ndarray'>
input (None, 4) produces 4 outputs
hidden (None, 10) produces 10 outputs
output (None, 1) produces 1 outputs
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-13-184a45e5abaa> in <module>()
40 )
41
---> 42 net1.fit(X, y)
/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in fit(self, X, y)
291
292 try:
--> 293 self.train_loop(X, y)
294 except KeyboardInterrupt:
295 pass
/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in train_loop(self, X, y)
298 def train_loop(self, X, y):
299 X_train, X_valid, y_train, y_valid = self.train_test_split(
--> 300 X, y, self.eval_size)
301
302 on_epoch_finished = self.on_epoch_finished
/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in train_test_split(self, X, y, eval_size)
399 kf = KFold(y.shape[0], round(1. / eval_size))
400 else:
--> 401 kf = StratifiedKFold(y, round(1. / eval_size))
402
403 train_indices, valid_indices = next(iter(kf))
/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self, y, n_folds, shuffle, random_state)
531 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
532 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 533 label_test_folds = test_folds[y == label]
534 # the test split can be too big because we used
535 # KFold(max(c, self.n_folds), self.n_folds) instead of
IndexError: too many indices for array
What is going on here? Am I doing something bad? I thing I tried everything but I am not able to figure out what is happening.
Note that I just updated today my lasagne and dependencies using the command: pip install -r https://raw.githubusercontent.com/dnouri/kfkd-tutorial/master/requirements.txt
Thanks in advance
I achieved to make it work by performing the subsequent changes but I still have some doubts:
I defined Y as a one-dimensional vector with 0/1 values as: y = Y.astype(np.int32)
but I still have some doubts
I had to change the parameter output_num_units=1
to output_num_units=2
and I'm not really sure of understanding that because I'm working with a binary classification problem and I think that this multilayer perceptron should have only 1 output neuron, not 2 of them... Am I wrong?
I also tried to change the cost function to a ROC-AUC. I know there's a parameter called objective_loss_function
which is defined as objective_loss_function=lasagne.objectives.categorical_crossentropy
by default but... how can I use the ROC AUC as the cost function instead of the categorical crossentropy?
Thanks
In nolearn if you do classification, output_num_units
is how many classes you have. While it is possible to implement two class classification with only one output unit, it is not specialcased in such way in nolearn, which follows for example from [1]:
if not self.regression:
predict = predict_proba.argmax(axis=1)
Note how the prediction is always the argmax no matter how many classes you have (implying that two class classification has two outputs, not one).
So your changes are correct: output_num_units
should always be the number of classes you have, even if you have two, and Y
should have a shape of either (num_samples)
or (num_samples, 1)
containing integer values that represent the categories, as opposed to, for example, having a vector with a bit per category with shape (num_samples, num_categories)
.
Answering your other question, Lasagne doesn't seem to have a ROC-AUC
objective, so you will need to implement it. Note that you cannot use implementation from scikit-learn, for example, because Lasagne requires the objective function to take theano tensors as arguments, not lists or ndarrays. To see how an objective function is implemented in Lasagne, you can take a look at existing objective functions [2]. Many of them refer to those inside theano, you can see their implementations in [3] (it will autoscroll to binary_crossentropy
, which is a good example of an objective function).
[1] https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne/base.py#L414
[2] https://github.com/Lasagne/Lasagne/blob/master/lasagne/objectives.py
[3] https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/nnet.py#L1809