I'm trying to find out if it's possbible - or what's the best way - to compare programmatically different topic models created with mallet to determine the "best" fitting model for the given corpus.
The API offers a Method to determine the Log Likelihood of the generated model. See f.e. : #modelLogLikelihood()
Afaik it's possible to compare different models based on the log likelihood of held-out data. But this method computes the likelihood of .. the whole model, I guess? I already checked the source code, but this didn't bring light into the darkness.
So my question is: Is the output of the above mentioned method suitable to compare different topic modeling algorithms like Hierarchical PAM, LDA, DMR, ... to find out which model (theoretically) represents the corpus the best way?
The intention of the log likelihood calculation is to provide a metric that is comparable across different models. That said, I wouldn't recommend using it in that way.
First, if you actually care about language model predictive likelihood, you should use one of many more recent deep neural models.
Second, likelihood is very sensitive to smoothing parameters, so the fact that you get consistent differences may be just an artifact of your own settings. Preprocessing decisions like tokenization and multi-word terms can also have a bigger impact than choice of model.
Third, if you are actually interested in topic model output, you should be clear about what you want from the model, and what characteristics of a model make it useful for your specific needs. I like to suggest that people think of a topic model as more like making a map than fitting a regression. The best resolution of the map depends on where you want to go.
Finally, you are almost certainly better off with the simplest model.