I'm restricted to a PostgreSQL
as 'model storage' for the models itself or respective components (coefficients, ..). Obviously, PostgreSQL
is far from being a fully-fledged model storage, so I can't rule out that I have to implement the whole model training process in Java
[...].
I couldn't find a solution that involves a PostgreSQL
database as intermediate storage for the models. Writing files directly to the disk/other storages isn't really an option for me. I considered calling Python
code from within the Java
application but I don't know whether this would be an efficient solution for subsequent inference tasks and beyond [...]. Are there ways to serialize PMML
or other formats that can be loaded via Java
implementations of the algorithms? Or ways to use the model definitions/parameters directly for reproducing the model [...]?
One way I found out is to save the LightGBM
model to str
in Python
, to subsequentially store it in a respective postgres
column (character varying
):
booster_.model_to_string(num_iteration=best_iteration_)
In Java you can use the LightGBM4j
, an (inofficial) Java
-wrapper for LightGBM
to load the model with the previouly stored model string:
LGBMBooster booster = LGBMBooster.loadModelFromString(modelString);
One problem seems to arise when using cateogrial variables (of type 'category' in Python), since this Java
wrapper seems to only support matrices in double
or float
[...].
With dmlc xgboost
there seems to exist a more 'final' version of a Java
-wrapper for XGBoost, so maybe switching from LightGBM to XGBoost
may offer a more stable version with some extended functionalities in comparison (at least by the time of writing).