Search code examples
pythonlightgbm

Why R2 Score is zero in LightGBM?


import numpy as np
import pandas as pd
import lightgbm
from sklearn.metrics import r2_score,mean_squared_error as MSE
dataset = pd.read_excel("Prali Marble.xlsx")
X = dataset.iloc[:,2].values.reshape((-1, 1))
Y = dataset.iloc[:,3].values

from lightgbm import LGBMRegressor
lgb_r = LGBMRegressor()
lgb_r.fit(X,Y)
y_pred = lgb_r.predict(X)
print("LGBM R2_SCORE:", r2_score(Y, lgb_r.predict(X)))

R2 score is given ZERO, so why was the zero value obtained in LGBMRegressor? I did not split my data train_test because my dataset small.


Solution

  • This example is not fully reproducible since the content of "Prali Marble.xlsx" is not included.

    However, I can reproduce a 0.0 R2 with the following code that I think closely matches your example. Similar to your code, this trains a LightGBM regression model on a dataset with a single feature.

    This code uses lightgbm 3.1.1 on Python 3.8.

    import numpy as np
    import pandas as pd
    import lightgbm as lgb
    from sklearn.metrics import r2_score,mean_squared_error as MSE
    
    X = pd.DataFrame({
        "feat1": np.append(np.repeat(0.5, 99), np.ones(1))
    })
    Y = np.random.random(100, )
    
    lgb_r = lgb.LGBMRegressor()
    lgb_r.fit(X,Y)
    
    y_pred = lgb_r.predict(X)
    print("LGBM R2_SCORE:", r2_score(Y, lgb_r.predict(X)))
    

    LGBM R2_SCORE: 0.0

    In this case, the R2 is 0 because the model is just predicting the mean of Y. You can see this by examining the structure of the model.

    lgb_r.booster_.trees_to_dataframe()
    

    That will return a 1-row dataframe, which happens when LightGBM does not add any trees.

    LightGBM has some parameters that are used to prevent overfitting. Two are relevant here:

    You can tell LightGBM to ignore these overfitting protections by setting these parameters to 0.

    import numpy as np
    import pandas as pd
    import lightgbm as lgb
    from sklearn.metrics import r2_score
    
    X = pd.DataFrame({
        "feat1": np.append(np.repeat(0.5, 99), np.ones(1))
    })
    Y = np.random.random(100, )
    
    lgb_r = lgb.LGBMRegressor(
        min_data_in_leaf=0,
        min_sum_hessian_in_leaf=0.0
    )
    lgb_r.fit(X,Y)
    
    y_pred = lgb_r.predict(X)
    print("LGBM R2_SCORE:", r2_score(Y, lgb_r.predict(X)))