I have problems using PMML models in JPMML (scala) with many input fields. Find a minimal example below: Load an image with 300x150 pixel and use this as an input for a PCA (python):
img = PIL.Image.open(filename)
img = img.resize(STANDARD_SIZE) # 300x150
img = np.array([int(np.mean(a)) for a in img])
pca = PCA(svd_solver=pca_method,n_components = components)
train = pca.fit_transform(train_x)
pipeline = PMMLPipeline(([('pca', pca), ('knn', neigh)]))
sklearn2pmml(pipeline, "/tmp/pca.pmml")
In a second step this model should be loaded using JPMML (scala):
val evaluator = new LoadingModelEvaluatorBuilder()
.setLocatable(false)
.load(new File("/tmp/pca.pmml"))
.build()
evaluator.verify()
which will lead to the quite obvious exception:
Exception in thread "main" org.jpmml.evaluator.InvalidElementException: Model has too many input fields
at org.jpmml.evaluator.ModelEvaluatorBuilder.checkSchema(ModelEvaluatorBuilder.java:135)
at org.jpmml.evaluator.ModelEvaluatorBuilder.build(ModelEvaluatorBuilder.java:115)
...
If you look at the source code you can find the following limit at the ModelEvaluatorBuilder
:
if((inputFields.size() + groupFields.size()) > 1000){
throw new InvalidElementException("Model has too many input fields", miningSchema);
}
So my 45k input fields are way too much. If I got the PMML documentation right I can only use atomic datatypes (int, char, double, etc.) for the inpt fields.
Any ideas how I can actually work around this limit?
You can override the ModelEvaluatorBuilder#checkSchema(ModelEvaluator)
method with your own checking logic (such as "accept everything"):
evaluator = new LoadingModelEvaluatorBuilder(){
@Override
protected void checkSchema(ModelEvaluator<?> modelEvaluator){
// Anything goes - I'm willing to accept the responsibility for my own actions
}
}
.setLocatable(false)
.load(new File("/tmp/pca.pmml"))
.build();
This sanity check is there for a reason. (J)PMML is not meant for processing binary blobs (such as images), and it's a really bad idea to represent an image object as 45k double fields.