I have one question to ask about Mallet topic modelling. How does it set its default hyperparameters for LDA i.e. alpha and beta?
The default for alpha
is 5.0 divided by the number of topics. You can think of this as five "pseudo-words" of weight on the uniform distribution over topics. If the document is short, we expect to stay closer to the uniform prior. If the document is long, we would feel more confident moving away from the prior.
With hyperparameter optimization, the alpha
value for each topic can be different. They usually become smaller than the default setting.
The default value for beta
is 0.01. This means that each topic has a weight on the uniform prior equal to the size of the vocabulary divided by 100. This seems to be a good value. With optimization turned on, the value rarely changes by more than a factor of two.