I have an CPU with 8 cores/16 threads, and I use cross_val_score with XGBRegressor both with njobs=6, but they actually use only 1 core (in htop-Console only 1 CPU has 100% load, the rest - 0%).
for i,n_est in enumerate(range(20,105,5)):
for j,m_dep in enumerate(range(3,10,2)):
for k,l_rate in enumerate([0.0001,0.001,0.01,0.1]):
sc = cross_val_score(estimator=xgb.XGBRegressor(njobs=6,
max_depth=m_dep,
learning_rate=l_rate,
n_estimators=n_est),
X=X_train,
y=y_train,
cv=5,
scoring='r2',
n_jobs=6)
res[i,j,k] = np.mean(sc)
l += 1
print(l,end = '')
What's wrong with that? The cross val score must be easily parallelized, as it runs 5 models on 5 independent data sets?
OK, it looks like
from joblib import parallel_backend
parallel_backend(backend='threading', n_jobs=-1)
helped. Now the CPU uses 4-5cores at least part of the time and the calculation proceeds faster.
UPD. XGBoost parameter has really solved the problem:
nthread=8
Now all cores are 100% loaded.