Search code examples
stanford-nlp

reducing CRFClassifier model file size


I am training a chunker using CoreNLP's CRFClassifier and I would like to reduce the size of the generated model file. I thought that I could use the featureCountThreshold property to threshold uncommon features and in this way reduce the file size, but I have tried several thresholds and the file size is always the same, so either I am doing something wrong or I misunderstood the featureCountThreshold property.

This is how I instantiate the CRFClassifier:

val props = new Properties()
props.setProperty("macro", "true")
props.setProperty("featureFactory", "edu.arizona.sista.chunker.ChunkingFeatureFactory")
props.setProperty("featureCountThreshold", "10")
new CRFClassifier[CoreLabel](props)

The code is in scala, but it should be straightforward.

Is this the right way to reduce the file size? And if not, is there a way to accomplish this?


Solution

  • For the next person trying to do this:

    There are two properties with similar names in CoreNLP: featureCountThreshold and featureCountThresh. featureCountThresh is the correct one for this task. We were able to reduce a model from 321M to 54M using a featureCountThresh of 10 and still retain almost the same performance.