Different scores when mutual_info_classif used independently and through SelectKBest

I'm trying to select a couple of hundreds of features out of 60,000 and for this I want to use mutual_info_classif.
But I see that I get different results when I use mutual_info_classif directly, compared with using SelectKBest.

To demonstrate it, I define a small df where only 1 column is correlated with target:

    A   B   C   D   E  target  
0   1   1   1   1   1   1  
1   2   3   2   2   2   0  
2   3   3   3   3   3   0  
3   4   3   4   4   4   0  
4   5   1   5   5   5   1

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import mutual_info_classif

df = pd.DataFrame({'A':[1,2,3,4,5], 
                   'B':[1,3,3,3,1], 
                   'C':[1,2,3,4,5], 
                   'D':[1,2,3,4,5], 
                   'E':[1,2,3,4,5], 
                   'target':[1,0,0,0,1]})

X = df.drop(['target'],axis=1)
y = df.target
threshold = 3  # the number of most relevant features

Then I get MI scores using mutual_info_classif:

high_score_features1 = []
feature_scores = mutual_info_classif(X, y, random_state=0, n_neighbors=3,discrete_features='auto')
for score, f_name in sorted(zip(feature_scores, X.columns), reverse=True)[:threshold]:
        print(f_name, score)
        high_score_features1.append(f_name)

feature_scores

Output:

B 0.48333333333333306  
E 0.0  
D 0.0

array([0.        , 0.48333333, 0.        , 0.        , 0.        ])

Then I use SelectKBest, and to assure same parameters are used, I'm using my own call:

def my_func(X, y):
    return mutual_info_classif(X, y, random_state=0, n_neighbors=3, discrete_features='auto')

high_score_features1=[]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

f_selector = SelectKBest(score_func=my_func, k=threshold)
f_selector.fit(X_train, y_train)
for score, f_name in sorted(zip(f_selector.scores_, X.columns), reverse=True)[:threshold]:
        print(f_name, score)
        high_score_features1.append(f_name)

f_selector.scores_

Output:

    B 0.8333333333333331  
    E 0.0  
    D 0.0

array([0.        , 0.83333333, 0.        , 0.        , 0.        ])

I don't understand the source of the difference and I'm not sure which way is more reliable to use for my real data.

Solution

It seems that the reason you're getting different results between directly using the mutual_info_classif model and using the SelectKBest model is because you're fitting them on different datasets. Your SelectKBest model is being fitted on a training set whereas your mutual_info_classif is being fitted to the entire data. If you fit both models on the training data then both models give identical output.