Catboost offers a dynamic C library, which theoretically can be used from any programming language.
I'm trying to call it through Java using JNA.
I'm having a problem with the CalcModelPrediction
function, defined in the header file as follows:
EXPORT bool CalcModelPrediction(
ModelCalcerHandle* calcer,
size_t docCount,
const float** floatFeatures, size_t floatFeaturesSize,
const char*** catFeatures, size_t catFeaturesSize,
double* result, size_t resultSize);
In Java, I've defined the interface function as follows:
public interface CatboostModel extends Library {
public Pointer ModelCalcerCreate();
public String GetErrorString();
public boolean LoadFullModelFromFile(Pointer calcer, String filename);
public boolean CalcModelPrediction(Pointer calcer, int docCount,
PointerByReference floatFeatures, int floatFeaturesSize,
PointerByReference catFeatures, int catFeaturesSize,
Pointer result, int resultSize);
public int GetFloatFeaturesCount(Pointer calcer);
public int GetCatFeaturesCount(Pointer calcer);
}
and then I'm calling it like this:
CatboostModel catboost;
Pointer modelHandle;
catboost = Native.loadLibrary("catboostmodel", CatboostModel.class);
modelHandle = catboost.ModelCalcerCreate();
if (!catboost.LoadFullModelFromFile(modelHandle, "catboost_test.model"))
{
throw new RuntimeException("Cannot load Catboost model.");
}
final PointerByReference ppFloatFeatures = new PointerByReference();
final PointerByReference ppCatFeatures = new PointerByReference();
final Pointer pResult = new Memory(Native.getNativeSize(Double.TYPE));
float[] floatFeatures = {0.5f, 0.8f, 0.3f, 0.3f, 0.1f, 0.5f, 0.4f, 0.8f, 0.3f, 0.3f} ;
String[] catFeatures = {"1", "2", "3", "4"};
int catFeaturesLength = 0;
for (String s : catFeatures)
{
catFeaturesLength += s.length() + 1;
}
try
{
final Pointer pFloatFeatures = new Memory(floatFeatures.length * Native.getNativeSize(Float.TYPE));
for (int dloop=0; dloop<floatFeatures.length; dloop++) {
pFloatFeatures.setFloat(dloop * Native.getNativeSize(Float.TYPE), floatFeatures[dloop]);
}
ppFloatFeatures.setValue(pFloatFeatures);
final Pointer pCatFeatures = new Memory(catFeaturesLength * Native.getNativeSize(Character.TYPE));
long offset = 0;
for (final String s : catFeatures) {
pCatFeatures.setString(offset, s);
pCatFeatures.setMemory(offset + s.length(), 1, (byte)(0));
offset += s.length() + 1;
}
ppCatFeatures.setValue(pCatFeatures);
}
catch (Exception e)
{
throw new RuntimeException("Couldn't initialize parameters for catboost");
}
try
{
if (!catboost.CalcModelPrediction(
modelHandle,
1,
ppFloatFeatures, 10,
ppCatFeatures, 4,
pResult, 1
))
{
throw new RuntimeException("No prediction made: " + catboost.GetErrorString());
}
else
{
double[] result = pResult.getDoubleArray(0, 1);
log.info("Catboost prediction: " + String.valueOf(result[0]));
Assert.assertFalse("ERROR: Result empty", result.length == 0);
}
}
catch (Exception e)
{
throw new RuntimeException("Prediction failed: " + e);
}
I've tried passing Pointer
, PointerByReference
and Pointer[]
to the CalcModelPrediction
function in place of float **floatFeatures
and char ***catFeatures
but nothing worked. I always get a segmentation fault, presumably when the CalcModelPrediction
function attempts to get elements of floatFeatures
and catFeatures
by calling floatFeatures[0][0]
and catFeatures[0][0]
.
So the question is, what's the right way of passing a multi-dimensional array from Java through JNA into C, where it could be treated as a pointer to a pointer to a value?
An interesting thing is that the CalcModelPredictionFlat
function that accepts only float **floatFeatures
and then simply calls *floatFeatures
, works perfectly fine when passing PointerByReference
.
UPDATE - 5.5.2018
Part 1
After trying to debug the segfault by slightly modifying the original Catboost .cpp and .h files and recompiling the libcatboost.so library, I found out that the segfault was due to me mapping size_t
in C to int
in Java. After fixing this, my interface function in Java looks like this:
public interface CatboostModel extends Library {
public boolean LoadFullModelFromFile(Pointer calcer, String filename);
public boolean CalcModelPrediction(Pointer calcer, size_t docCount,
Pointer[] floatFeatures, size_t floatFeaturesSize,
String[] catFeatures, size_t catFeaturesSize,
Pointer result, size_t resultSize);
}
Where the size_t
Class is defined as follows:
public static class size_t extends IntegerType {
public size_t() { this(0); }
public size_t(long value) { super(Native.SIZE_T_SIZE, value); }
}
Part 2
Looking more into the Catboost code, I've noticed that **floatFeatures
are being accessed by rows, like floatFeatures[i]
while ***catFeatures
are accessed by rows and columns, like catFetures[i][catFeatureIdx]
.
After changing floatFeatures
in Java to an array of Pointer
, my code started to work with the model trained without categorical features, i.e. catFeatures
length is zero.
This trick, however, didn't work with catFeatures
that are accessed through a double subscript operator [i][catFeatureidx]
. So for now, I modified the original Catboost code so that it would accept char **catFeatures
- an array of strings. In Java interface function, I set String[] catFeatures
. Now I can make predictions for one element at a time, which is not ideal.
I've managed to make it all work with the original Catboost code and libcatboost.so
.
The Java interface function is defined like this. Note that in order to emulate a 2D array (or a pointer to a pointer) of float values and strings, I'm using Pointer[]
type.
public interface CatboostModel extends Library {
public boolean LoadFullModelFromFile(Pointer calcer, String filename);
public boolean CalcModelPrediction(Pointer calcer, size_t docCount,
Pointer[] floatFeatures, size_t floatFeaturesSize,
Pointer[] catFeatures, size_t catFeaturesSize,
Pointer result, size_t resultSize);
}
After that, I populate the floatFeatures
and catFeatures
parameters like this (some dummy data here). Note that for strings I'm using JNA's StringArray
.
float[] floatFeatures = {0.4f, 0.8f, 0.3f, 0.3f, 0.1f, 0.5f, 0.4f, 0.8f, 0.3f, 0.3f} ;
String[] catFeatures = {"1", "2", "3", "4"};
final Pointer pFloatFeatures = new Memory(floatFeatures.length * Native.getNativeSize(Float.TYPE));
final Pointer[] ppFloatFeatures = new Pointer[2];
for (int dloop=0; dloop<10; dloop++) {
pFloatFeatures.setFloat(dloop * Native.getNativeSize(Float.TYPE), floatFeatures[dloop]);
}
ppFloatFeatures[0] = pFloatFeatures;
ppFloatFeatures[1] = pFloatFeatures;
final Pointer[] ppCatFeatures = new Pointer[catFeatures.length];
final Pointer pCatFeatures = new StringArray(catFeatures);
ppCatFeatures[0] = pCatFeatures;
ppCatFeatures[1] = pCatFeatures;
Finally, I pass these parameters to Catboost:
if (!catboost.CalcModelPrediction(
modelHandle,
new size_t(2L),
ppFloatFeatures, new size_t((long)floatFeatures.length),
ppCatFeatures, new size_t((long)catFeatures.length),
pResult, new size_t(2L)
))
{
throw new RuntimeException("No prediction made: " + catboost.GetErrorString());
}
To get predictions we can do:
double[] result = pResult.getDoubleArray(0, 2);