python multithreading cross-validation xgboost

XGBRegressor + cross_val_score use only 1 kernel?

I have an CPU with 8 cores/16 threads, and I use cross_val_score with XGBRegressor both with njobs=6, but they actually use only 1 core (in htop-Console only 1 CPU has 100% load, the rest - 0%).

for i,n_est in enumerate(range(20,105,5)):
    for j,m_dep in enumerate(range(3,10,2)):
        for k,l_rate in enumerate([0.0001,0.001,0.01,0.1]):
            sc = cross_val_score(estimator=xgb.XGBRegressor(njobs=6, 
                                                            max_depth=m_dep, 
                                                            learning_rate=l_rate, 
                                                            n_estimators=n_est), 
                                 X=X_train, 
                                 y=y_train,
                                 cv=5, 
                                 scoring='r2', 
                                 n_jobs=6)
        res[i,j,k] = np.mean(sc)
        l += 1
        print(l,end = '')

What's wrong with that? The cross val score must be easily parallelized, as it runs 5 models on 5 independent data sets?

Solution

OK, it looks like

from joblib import parallel_backend
parallel_backend(backend='threading', n_jobs=-1)

helped. Now the CPU uses 4-5cores at least part of the time and the calculation proceeds faster.

UPD. XGBoost parameter has really solved the problem:

nthread=8

Now all cores are 100% loaded.