python machine-learning grid-search lightgbm gridsearchcv

How to save every predicted result in each iteration of GridSearchCV with LightGBM

I am trying to use GridSearchCV to tune parameters in LightGBM model, but I am not familiar enough with how to save each predicted result in each iteration of GridSearchCV.
But sadly, I only know how to save the result in a specific parameter.
Here is the code:

param = {
    'bagging_freq': 5,
    'bagging_fraction': 0.4,
    'boost_from_average':'false',
    'boost': 'gbdt',
    'feature_fraction': 0.05,
    'learning_rate': 0.01,
    'max_depth': -1,  
    'metric':'auc',
    'min_data_in_leaf': 80,
    'min_sum_hessian_in_leaf': 10.0,
    'num_leaves': 13,
    'num_threads': 8,
    'tree_learner': 'serial',
    'objective': 'binary', 
    'verbosity': 1
}
features = [c for c in train_df.columns if c not in ['ID_code', 'target']]
target = train_df['target']
folds = StratifiedKFold(n_splits=10, shuffle=False, random_state=44000)
oof = np.zeros(len(train_df))
predictions = np.zeros(len(test_df))

for fold_, (trn_idx, val_idx) in enumerate(folds.split(train_df.values, target.values)):
    print("Fold {}".format(fold_))
    trn_data = lgb.Dataset(train_df.iloc[trn_idx][features], label=target.iloc[trn_idx])
    val_data = lgb.Dataset(train_df.iloc[val_idx][features], label=target.iloc[val_idx])    
    num_round = 1000000
    clf = lgb.train(param, trn_data, num_round, valid_sets = [trn_data, val_data], verbose_eval=1000, early_stopping_rounds = 3000)
    oof[val_idx] = clf.predict(train_df.iloc[val_idx][features], num_iteration=clf.best_iteration)        
    predictions += clf.predict(test_df[features], num_iteration=clf.best_iteration) / folds.n_splits

print("CV score: {:<8.5f}".format(roc_auc_score(target, oof)))
print('Saving the Result File')
res= pd.DataFrame({"ID_code": test.ID_code.values})
res["target"] = predictions
res.to_csv('result_10fold{}.csv'.format(num_sub), index=False)

Here is the data:

train_df.head(3)

         ID_code    target    var_0    var_1    ...  var_199
0        train_0     0        8.9255   -6.7863       -9.2834     
1        train_1     1        11.5006  -4.1473        7.0433  
2        train_2     0        8.6093   -2.7457       -9.0837 


train_df.head(3)

         ID_code    var_0   var_1    ... var_199
0        test_0     9.4292  11.4327      -2.3805          
1        test_1     5.0930  11.4607      -9.2834      
2        train_2    7.8928  10.5825      -9.0837

I want to save each predictions of each iteration of GridSearchCV and I have searched several similar questions and some other relevant information of using GridSearchCV in LightGBM.
BUT I still can't code it right.
SO, if not mind, could anyone help me and give some tutorials about it?
Thanks sincerely.

Solution

You can use the ParameterGrid or ParameterSampler from sklearn to do parameter sampling- it will correspond to the GridSearchCV and RandomSearchCV, respectively. For example,

def train_lgb(num_folds=11, param=param_original):
    ...
    return predictions, sub

params = {
# your base parameters
}

# define the grid for parameter sampling
from sklearn.model_selection import ParameterGrid
par_grid = ParameterGrid([{'bagging_freq':[6,7]},
                          {'num_leaves': [13,15]}
                         ])

prediction_list = {}
sub_list = {}

import copy
for i, ps in enumerate(par_grid):
    print('This is param{}'.format(i))
    # copy the base params dictionary and update with sampled values
    val = copy.deepcopy(params)
    val.update(ps)
    # main training loop
    prediction, sub = train_lgb(param=val) 
    prediction_list.update({key: prediction})
    sub_list.update({key: sub})

Edit: By the way, I realized that i was investigating the same issue recently and was learning how to address using some ML tools. I've created a page summarising how to use MLflow for this task: https://mlisovyi.github.io/KaggleSantander2019/ (and the associated github page for the actual code). Note, that it by accident is based on the same data that you are working on :). I hope it will be useful.