Search code examples
pythonjavamachine-learningxgboostlightgbm

How to save XGBoost/LightGBM model to PostgreSQL database in Python for subsequent inference in Java?


I'm restricted to a PostgreSQL as 'model storage' for the models itself or respective components (coefficients, ..). Obviously, PostgreSQL is far from being a fully-fledged model storage, so I can't rule out that I have to implement the whole model training process in Java [...].

I couldn't find a solution that involves a PostgreSQL database as intermediate storage for the models. Writing files directly to the disk/other storages isn't really an option for me. I considered calling Python code from within the Java application but I don't know whether this would be an efficient solution for subsequent inference tasks and beyond [...]. Are there ways to serialize PMML or other formats that can be loaded via Java implementations of the algorithms? Or ways to use the model definitions/parameters directly for reproducing the model [...]?


Solution

  • One way I found out is to save the LightGBM model to str in Python, to subsequentially store it in a respective postgres column (character varying):

    booster_.model_to_string(num_iteration=best_iteration_)
    

    In Java you can use the LightGBM4j, an (inofficial) Java-wrapper for LightGBM to load the model with the previouly stored model string:

    LGBMBooster booster = LGBMBooster.loadModelFromString(modelString);
    

    One problem seems to arise when using cateogrial variables (of type 'category' in Python), since this Java wrapper seems to only support matrices in double or float [...].

    With dmlc xgboost there seems to exist a more 'final' version of a Java-wrapper for XGBoost, so maybe switching from LightGBM to XGBoost may offer a more stable version with some extended functionalities in comparison (at least by the time of writing).