Search code examples
pythonmachine-learningscikit-learnprobabilitymlp

How does sklearn's MLP predict_proba function work internally?


I am trying to understand how sklearn's MLP Classifier retrieves its results for its predict_proba function.

The website simply lists:

Probability estimates

While many others, such as logistic regression, have more detailed answers: Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.

Other model types, too, have more detail. Take for example a support vector machine classifier

And there is also this very nice Stack Overflow post which explains it in depth.

Compute probabilities of possible outcomes for samples in X.

The model need to have probability information computed at training time: fit with attribute probability set to True.

Other Examples

Random Forest:

Predict class probabilities for X.

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.

Gaussian Process Classifier:

I am looking to understand the same thing as the above post, but for the MLPClassifier. How does the MLPClassifier work internally?


Solution

  • Looking within the source code, i found:

    def _initialize(self, y, layer_units):
    
        # set all attributes, allocate weights etc for first call
        # Initialize parameters
        self.n_iter_ = 0
        self.t_ = 0
        self.n_outputs_ = y.shape[1]
    
        # Compute the number of layers
        self.n_layers_ = len(layer_units)
    
        # Output for regression
        if not is_classifier(self):
            self.out_activation_ = 'identity'
        # Output for multi class
        elif self._label_binarizer.y_type_ == 'multiclass':
            self.out_activation_ = 'softmax'
        # Output for binary class and multi-label
        else:
            self.out_activation_ = 'logistic'
    

    It seems that MLP Classifier uses a logistic function for binary classification and a softmax function for multi-label classification in order to build the output layer. This suggests that the output of the net is a probability vector, based on which the net deduces predictions.

    If I look to the predict_proba method:

    def predict_proba(self, X):
        """Probability estimates.
        Parameters
        ----------
        X : {array-like, sparse matrix} of shape (n_samples, n_features)
            The input data.
        Returns
        -------
        y_prob : ndarray of shape (n_samples, n_classes)
            The predicted probability of the sample for each class in the
            model, where classes are ordered as they are in `self.classes_`.
        """
        check_is_fitted(self)
        y_pred = self._predict(X)
    
        if self.n_outputs_ == 1:
            y_pred = y_pred.ravel()
    
        if y_pred.ndim == 1:
            return np.vstack([1 - y_pred, y_pred]).T
        else:
            return y_pred
    

    That confirms the action of a softmax or a logistic as activation function for the output layer in order to have a probability vector.

    Hoping this can help you.