Search code examples
probabilityldatopic-modelingmallet

how to get probability of words of topics in Mallet


I am using LDA in mallet to explore my data. I do not have any problem with running, just I need to have the probability of top words (let's say 20 words)

I use this query:

bin\mallet train-topics  --input tutorial.mallet  --num-topics 40 --optimize-interval 20 --output-state topic-state_doc_40t.gz  --output-topic-keys tutorial_keys_doc_40t.txt --output-doc-topics tutorial_composition_doc_40t.txt

I do not know what would be the query for words' probabilities.


Solution

  • Late answer, but who knows, it might help someone else.

    MALLET 2.0.8 has a new feature to output a very interesting diagnostics file containing a bunch of metrics for each topic and its top words. Word probability is one of them.

    Simply add --diagnostics-file FILENAME to your train-topics command.

    Number of words described for each topic is the same than defined by "--num-top-words".

    Here is the link to a detailed documentation: http://mallet.cs.umass.edu/diagnostics.php. If you don't want to re-train your topic, you can output the diagnostics file anyway by using your "state" file. Everything is described in the link.