I am training a chunker using CoreNLP's CRFClassifier
and I would like to reduce the size of the generated model file. I thought that I could use the featureCountThreshold
property to threshold uncommon features and in this way reduce the file size, but I have tried several thresholds and the file size is always the same, so either I am doing something wrong or I misunderstood the featureCountThreshold
property.
This is how I instantiate the CRFClassifier
:
val props = new Properties()
props.setProperty("macro", "true")
props.setProperty("featureFactory", "edu.arizona.sista.chunker.ChunkingFeatureFactory")
props.setProperty("featureCountThreshold", "10")
new CRFClassifier[CoreLabel](props)
The code is in scala, but it should be straightforward.
Is this the right way to reduce the file size? And if not, is there a way to accomplish this?
For the next person trying to do this:
There are two properties with similar names in CoreNLP: featureCountThreshold
and featureCountThresh
. featureCountThresh
is the correct one for this task.
We were able to reduce a model from 321M to 54M using a featureCountThresh
of 10 and still retain almost the same performance.