Search code examples
machine-learningscikit-learnneural-networksoftmax

How scikit learn implements the output layer


  1. In scikit learn, how many neurons are in the output layer? As stated here, you can only specify the hidden layer size and their neurons but nothing about the output layer, thus I am not sure how scikit learn implements the output layer.
  2. Does it make sense to use softmax activation function for output layer having only a single neuron?

Solution

  • Test:

    Setup:

    In [227]: %paste
    clf = MLPClassifier()
    
    m = 10**3
    n = 64
    
    df = pd.DataFrame(np.random.randint(100, size=(m, n))).add_prefix('x') \
           .assign(y=np.random.choice([-1,1], m))
    
    
    X_train, X_test, y_train, y_test = \
        train_test_split(df.drop('y',1), df['y'], test_size=0.2, random_state=33)
    
    clf.fit(X_train, y_train)
    ## -- End pasted text --
    Out[227]:
    MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
           beta_2=0.999, early_stopping=False, epsilon=1e-08,
           hidden_layer_sizes=(100,), learning_rate='constant',
           learning_rate_init=0.001, max_iter=200, momentum=0.9,
           nesterovs_momentum=True, power_t=0.5, random_state=None,
           shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
           verbose=False, warm_start=False)
    

    Number of outputs:

    In [229]: clf.n_outputs_
    Out[229]: 1
    

    Number of layers:

    In [228]: clf.n_layers_
    Out[228]: 3
    

    The number of iterations the solver has ran:

    In [230]: clf.n_iter_
    Out[230]: 60
    

    Here is an excerpt of the source code where the activation function for the output layer will be chosen:

        # Output for regression
        if not is_classifier(self):
            self.out_activation_ = 'identity'
        # Output for multi class
        elif self._label_binarizer.y_type_ == 'multiclass':
            self.out_activation_ = 'softmax'
        # Output for binary class and multi-label
        else:
            self.out_activation_ = 'logistic'
    

    UPDATE: MLPClassifier binarizes (in a one-vs-all fashion) labels internaly, so logistic regression should work well also with labels that are differ from [0,1]:

        if not incremental:
            self._label_binarizer = LabelBinarizer()
            self._label_binarizer.fit(y)
            self.classes_ = self._label_binarizer.classes_