sklearn mutual_info_classif returns different values depending on feature order

I noticed behavior in the sklearn mutual_info_classif function that is inconsistent with what I expect in the mutual information objective.

Given a set of columns ['A', 'B', 'C'] and a dependent variable y, the mutual information computed could be between all features and y (a single scalar) or a single feature and y (list of scalars). Based on this, I'm not sure what values sklearn is returning to me because the values change with the ordering of the features and it changes with the number of features inputted.

The mutual information value (between a feature and the dependent variable) given by sklearn changes with the ordering of the columns. For example, the following queries give different outputs.

feature_scores = mutual_info_classif(X[['A', 'B', 'C']], y, random_state=0)
feature_scores

array([0.        , 0.13, 0.045])

feature_scores = mutual_info_classif(X[['A', 'C', 'B']], y, random_state=0)
feature_scores

array([0.        , 0.017, 0.14]

Another unexpected behavior is that changing the set of features changes the mutual information value.

feature_scores = mutual_info_classif(X[['A', 'B']], y, random_state=0)
feature_scores

array([0.        , 0.14])

feature_scores = mutual_info_classif(X[['A', 'B', 'C']], y, random_state=0)
feature_scores

array([0.        , 0.13, 0.045])

Can anyone explain this behavior to me and why this is correct?

Solution

It's likely that mutual_info_classif uses a single random number generator, which means that changing the amount and order of data to process changes the way random values are applied to calculating the mutual information.

To avoid this, you can just calculate MI for each column one at a time.