python machine-learning deep-learning lasagne nolearn

Nolearn raises an index error when running a classification, but not with regression

I'm stuck from several days ago with the problem I'm going to describe. I'm following the Daniel Nouri's tutorial about deep learning: http://danielnouri.org/notes/category/deep-learning/ and I tried to adapt his example to a classification dataset. My problem here is that if I treat the dataset as a regression problem, it works properly, but if I try to perform a classification, it fails. I tried to write 2 reproducible examples.

1) Regression (it works well)

import lasagne
from sklearn import datasets
import numpy as np
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from sklearn.preprocessing import StandardScaler

iris = datasets.load_iris()
X = iris.data[iris.target<2]  # we only take the first two features.
Y = iris.target[iris.target<2]
stdscaler = StandardScaler(copy=True, with_mean=True, with_std=True)
X = stdscaler.fit_transform(X).astype(np.float32)
y = np.asmatrix((Y-0.5)*2).T.astype(np.float32)

print X.shape, type(X)
print y.shape, type(y)

net1 = NeuralNet(
    layers=[  # three layers: one hidden layer
        ('input', layers.InputLayer),
        ('hidden', layers.DenseLayer),
        ('output', layers.DenseLayer),
        ],
    # layer parameters:
    input_shape=(None, 4),  # 96x96 input pixels per batch
    hidden_num_units=10,  # number of units in hidden layer
    output_nonlinearity=None,  # output layer uses identity function
    output_num_units=1,  # 1 target value

    # optimization method:
    update=nesterov_momentum,
    update_learning_rate=0.01,
    update_momentum=0.9,

    regression=True,  # flag to indicate we're dealing with regression problem
    max_epochs=400,  # we want to train this many epochs
    verbose=1,
    )

net1.fit(X, y)

2) Classification (it raises an error of matrix dimensionalities; I paste it below)

import lasagne
from sklearn import datasets
import numpy as np
from lasagne import layers
from lasagne.nonlinearities import softmax
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from sklearn.preprocessing import StandardScaler

iris = datasets.load_iris()
X = iris.data[iris.target<2]  # we only take the first two features.
Y = iris.target[iris.target<2]
stdscaler = StandardScaler(copy=True, with_mean=True, with_std=True)
X = stdscaler.fit_transform(X).astype(np.float32)
y = np.asmatrix((Y-0.5)*2).T.astype(np.int32)

print X.shape, type(X)
print y.shape, type(y)

net1 = NeuralNet(
    layers=[  # three layers: one hidden layer
        ('input', layers.InputLayer),
        ('hidden', layers.DenseLayer),
        ('output', layers.DenseLayer),
        ],
    # layer parameters:
    input_shape=(None, 4),  # 96x96 input pixels per batch
    hidden_num_units=10,  # number of units in hidden layer
    output_nonlinearity=softmax,  # output layer uses identity function
    output_num_units=1,  # 1 target value

    # optimization method:
    update=nesterov_momentum,
    update_learning_rate=0.01,
    update_momentum=0.9,

    regression=False,  # flag to indicate we're dealing with classification problem
    max_epochs=400,  # we want to train this many epochs
    verbose=1,
    )

net1.fit(X, y)

The failed output I get with the code 2.

(100, 4) <type 'numpy.ndarray'>
(100, 1) <type 'numpy.ndarray'>
  input                 (None, 4)               produces       4 outputs
  hidden                (None, 10)              produces      10 outputs
  output                (None, 1)               produces       1 outputs
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-13-184a45e5abaa> in <module>()
     40     )
     41 
---> 42 net1.fit(X, y)

/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in fit(self, X, y)
    291 
    292         try:
--> 293             self.train_loop(X, y)
    294         except KeyboardInterrupt:
    295             pass

/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in train_loop(self, X, y)
    298     def train_loop(self, X, y):
    299         X_train, X_valid, y_train, y_valid = self.train_test_split(
--> 300             X, y, self.eval_size)
    301 
    302         on_epoch_finished = self.on_epoch_finished

/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in train_test_split(self, X, y, eval_size)
    399                 kf = KFold(y.shape[0], round(1. / eval_size))
    400             else:
--> 401                 kf = StratifiedKFold(y, round(1. / eval_size))
    402 
    403             train_indices, valid_indices = next(iter(kf))

/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self, y, n_folds, shuffle, random_state)
    531         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
    532             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 533                 label_test_folds = test_folds[y == label]
    534                 # the test split can be too big because we used
    535                 # KFold(max(c, self.n_folds), self.n_folds) instead of

IndexError: too many indices for array

What is going on here? Am I doing something bad? I thing I tried everything but I am not able to figure out what is happening.

Note that I just updated today my lasagne and dependencies using the command: pip install -r https://raw.githubusercontent.com/dnouri/kfkd-tutorial/master/requirements.txt

Thanks in advance

Edit

I achieved to make it work by performing the subsequent changes but I still have some doubts:

I defined Y as a one-dimensional vector with 0/1 values as: y = Y.astype(np.int32) but I still have some doubts
I had to change the parameter output_num_units=1 to output_num_units=2 and I'm not really sure of understanding that because I'm working with a binary classification problem and I think that this multilayer perceptron should have only 1 output neuron, not 2 of them... Am I wrong?

I also tried to change the cost function to a ROC-AUC. I know there's a parameter called objective_loss_function which is defined as objective_loss_function=lasagne.objectives.categorical_crossentropy by default but... how can I use the ROC AUC as the cost function instead of the categorical crossentropy?

Thanks

Solution

In nolearn if you do classification, output_num_units is how many classes you have. While it is possible to implement two class classification with only one output unit, it is not specialcased in such way in nolearn, which follows for example from [1]:

    if not self.regression:
        predict = predict_proba.argmax(axis=1)

Note how the prediction is always the argmax no matter how many classes you have (implying that two class classification has two outputs, not one).

So your changes are correct: output_num_units should always be the number of classes you have, even if you have two, and Y should have a shape of either (num_samples) or (num_samples, 1) containing integer values that represent the categories, as opposed to, for example, having a vector with a bit per category with shape (num_samples, num_categories).

Answering your other question, Lasagne doesn't seem to have a ROC-AUC objective, so you will need to implement it. Note that you cannot use implementation from scikit-learn, for example, because Lasagne requires the objective function to take theano tensors as arguments, not lists or ndarrays. To see how an objective function is implemented in Lasagne, you can take a look at existing objective functions [2]. Many of them refer to those inside theano, you can see their implementations in [3] (it will autoscroll to binary_crossentropy, which is a good example of an objective function).

[1] https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne/base.py#L414

[2] https://github.com/Lasagne/Lasagne/blob/master/lasagne/objectives.py

[3] https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/nnet.py#L1809