Search code examples
indexingsphinx

Can sphinx count all words in its index using morphology?


I want to rate most frequently words in sphinx index. The only one method I found it's /usr/bin/indexer -c /etc/sphinxsearch/sphinx.conf indexname --buildfreqs --buildstops /home/user/test.txt 1000. But this method doesn't consider morphology. One word in different forms counting as several words. Maybe there's another method for count all indexed words?


Solution

  • As noted in comments, can use indextool --dumpdict - which should give the word counts from the index. Because its from the index, its already been 'normalized' as per charset_table, wordforms, and even morphology.

    (but only works on a dict=keywords index)