Search code examples
pythonflaskscikit-learnlightgbm

LightGBM Model predict the same values in a flask route


I'm a new user in the StackOverflow community, thanks for helping me. Here is the situation I am facing with: I have a model.py file responsible for training the LightGBMRegressor model with sklearn's RandomizedSearchCV. After training, I save the model with the pickle.

    n_estimators = [int(x) for x in np.linspace(start = 200, stop = 4000, num = 20)]
    max_depth = [int(x) for x in np.linspace(10, 100, num = 10)]
    num_leaves = [int(x) for x in np.linspace(10, 150, num = 10)]
    learning_rate = [0.03, 0.05, 0.1, 0.2, 0.3]
    subsample_for_bin = [100000,200000, 300000, 400000]
    random_grid = {'n_estimators': n_estimators,
               'max_depth': max_depth,
               'num_leaves': num_leaves,
               'learning_rate': learning_rate,
               'subsample_for_bin': subsample_for_bin}
    gbm = lgb.LGBMRegressor()
    gbm_random = RandomizedSearchCV(estimator = gbm, param_distributions = random_grid, scoring=['neg_mean_absolute_error', 'neg_root_mean_squared_error'],refit= 'neg_root_mean_squared_error',n_iter = 100, cv = 4, verbose = 2, random_state = 42, n_jobs = -1)
    gbm_random.fit(data_base[features_x], data_base[target_y])
    pkl_filename = "../output/lightGBM[3].pkl"
    with open(pkl_filename, 'wb') as file:
       pickle.dump(gbm_random, file)

To validate the training I load the model in the predict.py file with the pickle and submit the test set.

data_base_test = pd.read_csv("../output/table_test3.csv")
pkl_filename = "../output/lightGBM[3].pkl"
with open(pkl_filename, 'rb') as file:
    gbm = pickle.load(file)
predict_test = gbm.predict(data_base_test[features_x])
print(predict_test)

The predict_test is:

[0.66487458 0.82479892 1.89628195 ... 3.83358101 5.21799368 0.33858825]

I am ok with machine learning stuff but a total newbie at web development field. When I create a web development with flask, load the model on the route and try to make predictions from the same test set as the previous script, all predictions in the model have the same value of = 66. What problem can I be facing? Note: get_json receives the entire test set in json format

pkl_filename = "model/lightGBM[3].pkl"
with open(pkl_filename, 'rb') as file:
    gbm = pickle.load(file)

app = flask.Flask(__name__, template_folder='templates')

@app.route('/predict', methods=['POST'])
def main():

    test_json = request.get_json()
    df_json = pd.read_json(test_json, orient='records')
    columns_name = df_json.columns.values
    columns_name = np.delete(columns_name, np.where('qtde_venda'))
    features_x = columns_name.tolist()
    #prediction
    predict = gbm.predict(df_json[features_x])
    print(predict)
    return(flask.render_template('main.html'))


if __name__ == '__main__':

    app.run()

The predict vector is:

[66. 66. 66. ... 66. 66. 66.]

expected output vs Expected output

[0.66487458 0.82479892 1.89628195 ... 3.83358101 5.21799368 0.33858825]
[66. 66. 66. ... 66. 66. 66.]

Solution

  • I don't know how to explain what happened but who was causing the error was the anaconda environment. To solve it I removed anaconda and started using Python Venv