machine-learning scikit-learn neural-network softmax

How scikit learn implements the output layer

In scikit learn, how many neurons are in the output layer? As stated here, you can only specify the hidden layer size and their neurons but nothing about the output layer, thus I am not sure how scikit learn implements the output layer.
Does it make sense to use softmax activation function for output layer having only a single neuron?

Solution

Test:

Setup:

In [227]: %paste
clf = MLPClassifier()

m = 10**3
n = 64

df = pd.DataFrame(np.random.randint(100, size=(m, n))).add_prefix('x') \
       .assign(y=np.random.choice([-1,1], m))


X_train, X_test, y_train, y_test = \
    train_test_split(df.drop('y',1), df['y'], test_size=0.2, random_state=33)

clf.fit(X_train, y_train)
## -- End pasted text --
Out[227]:
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

Number of outputs:

In [229]: clf.n_outputs_
Out[229]: 1

Number of layers:

In [228]: clf.n_layers_
Out[228]: 3

The number of iterations the solver has ran:

In [230]: clf.n_iter_
Out[230]: 60

Here is an excerpt of the source code where the activation function for the output layer will be chosen:

    # Output for regression
    if not is_classifier(self):
        self.out_activation_ = 'identity'
    # Output for multi class
    elif self._label_binarizer.y_type_ == 'multiclass':
        self.out_activation_ = 'softmax'
    # Output for binary class and multi-label
    else:
        self.out_activation_ = 'logistic'

UPDATE: MLPClassifier binarizes (in a one-vs-all fashion) labels internaly, so logistic regression should work well also with labels that are differ from [0,1]:

    if not incremental:
        self._label_binarizer = LabelBinarizer()
        self._label_binarizer.fit(y)
        self.classes_ = self._label_binarizer.classes_