Search code examples
grid-searchlightgbm

lightgbm gridsearchcv hanging forever with n_jobs=1


I read the Previous posts on LightGBM being used along with GridSearchCV() which hangs and have corrected my code accordingly. But still the code seems to be hanging for like > 3 hours !

I have a 8 GB RAM , Data has 29802 rows and 13 cols. Most of the cols are categorical values which have been labelled as numbers.

Please see the code below. Awaiting your valuable suggestions !

Initially I got a AUC of 89% with lgb.train().

But after using the LGBMClassifier() I am no where getting near it. Hence I had opted for the GridSearchCV().

I needed the LGBMClassifier() since I wanted the score() and other easy wrappers which I was not able to find while using lgb.train().

I commented most of my settings of the params now. But still grid search does not seem to end :(

X and y are my complete training data set:

params = {'boosting_type': 'gbdt',
      'max_depth' : 15,
      'objective': 'binary',
      #'nthread': 1, # Updated from nthread
      'num_leaves': 30,
      'learning_rate': 0.001,
      #'max_bin': 512,
      #'subsample_for_bin': 200,
      'subsample': 0.8,
      'subsample_freq': 500,
      #'colsample_bytree': 0.8,
      #'reg_alpha': 5,
      #'reg_lambda': 10,
      #'min_split_gain': 0.5,
      #'min_child_weight': 1,
      #'min_child_samples': 5,
      #'scale_pos_weight': 1,
      #'num_class' : 1,
      'metric' : 'roc_auc',
      'early_stopping' : 10,
      'n_jobs': 1,
     }

gridParams = {
'learning_rate': [0.001,0.01],
'n_estimators': [ 1000],
'num_leaves': [12, 30,80],
'boosting_type' : ['gbdt'],
'objective' : ['binary'],
'random_state' : [1], # Updated from 'seed'
'colsample_bytree' : [ 0.8, 1],
'subsample' : [0.5,0.7,0.75],
'reg_alpha' : [0.1, 1.2],
'reg_lambda' : [0.1, 1.2],
'subsample_freq' : [500,1000],
'max_depth' : [15, 30, 80]
}
    mdl = lgb.LGBMClassifier(**params) 
    grid = GridSearchCV(mdl, gridParams,return_train_score=True,
                verbose=1,
                cv=4,
                n_jobs=1, #only '1' will work 
                scoring='roc_auc'                    
               )
    grid.fit(X=X, y=y,eval_set=[[X,y]],early_stopping_rounds=10) # never ending code

    Output:

    Fitting 4 folds for each of 864 candidates, totalling 3456 fits
[1] valid_0's binary_logloss: 0.686044
Training until validation scores don't improve for 10 rounds.
[2] valid_0's binary_logloss: 0.685749
[3] valid_0's binary_logloss: 0.685433
[4] valid_0's binary_logloss: 0.685134
[5] valid_0's binary_logloss: 0.684831
[6] valid_0's binary_logloss: 0.684517
[7] valid_0's binary_logloss: 0.684218
[8] valid_0's binary_logloss: 0.683904
[9] valid_0's binary_logloss: 0.683608
[10]    valid_0's binary_logloss: 0.683308
[11]    valid_0's binary_logloss: 0.683009
[12]    valid_0's binary_logloss: 0.68271
[13]    valid_0's binary_logloss: 0.682416
[14]    valid_0's binary_logloss: 0.682123
[15]    valid_0's binary_logloss: 0.681814
[16]    valid_0's binary_logloss: 0.681522
[17]    valid_0's binary_logloss: 0.681217
[18]    valid_0's binary_logloss: 0.680922
[19]    valid_0's binary_logloss: 0.680628
[20]    valid_0's binary_logloss: 0.680322
[21]    valid_0's binary_logloss: 0.680029
[22]    valid_0's binary_logloss: 0.679736
[23]    valid_0's binary_logloss: 0.679443
[24]    valid_0's binary_logloss: 0.679151
[25]    valid_0's binary_logloss: 0.678848
[26]    valid_0's binary_logloss: 0.678546
[27]    valid_0's binary_logloss: 0.678262
[28]    valid_0's binary_logloss: 0.677974
[29]    valid_0's binary_logloss: 0.677675
[30]    valid_0's binary_logloss: 0.677393
[31]    valid_0's binary_logloss: 0.677093........................
.....................
[997]   valid_0's binary_logloss: 0.537612
[998]   valid_0's binary_logloss: 0.537544
[999]   valid_0's binary_logloss: 0.537481
[1000]  valid_0's binary_logloss: 0.53741
Did not meet early stopping. Best iteration is:
[1000]  valid_0's binary_logloss: 0.53741

................................ and it goes on and on ...............

Please help !

Regards Sherin


Solution

  • Your problem is different from the aforementioned hanging. You train very many (3456) classifiers each of many (1000) very deep (leaves 13..80) trees. Thus, training time is very long. The solution is to either be more modest with the tree depth (most practical is to fix the depth to -1 and vary the number of leaves in the grid search, for your dataset size it would be between 10 and 40 leaves maybe?), or to reduce the number of grid points (860 grid points is A LOT), or reduce the number of trees (=iterations) by reducing from 1000 to 100 (a random pick) or better by having meaningful early stopping.

    One obvious issue: there is no point to use the training data (X,y) for the early stopping criterion (eval_set=[[X,y]],early_stopping_rounds=10) the objective function will be optimised infinitely and your training will be stopped by reaching the maximum number of iterations only (1000 trees in your case).