Search code examples
sphinx

Convert Sphinx Index to Table?


I go through a pretty intense sphinx configuration each day to convert the millions of records into a usable/searchable sphinx index.

However I now need to export that as an xml file, if not that as a new table.

Naturally I could do most/all of the work I do in the Sphinx Index in Mysql as well but it seems like a lot of unncessary work if I've just generated a Sphinx Index. Can I somehow 'export' that index to a table or is the full-text indexing essentially now useless to me as readable data?


Solution

  • Well it depends WHAT you want out.

    The Sphinx index, is estiently a Inverted Index. https://en.wikipedia.org/wiki/Inverted_index

    ... as such its good for finding which 'documents' contain a given word, it litterally stores that as a list. (ideally suited to the fundamental function of a query! Just sphinx does heavy lifting for multi-word queries, as well as ranking results)

    ... such a structure is NOT organized by document. SO cant directly get a list of which words are in a given document. (to compute htat would have to traverse the entire data structure)


    But if it's the inverted index that you DO want can dump it with indextool http://sphinxsearch.com/docs/current.html#ref-indextool ... eg the --dumpdict and even --dumphitlist commands. (although dumpdict only works on dict=keywords indexes)


    You might be interested in the --dump-rows option on indexer http://sphinxsearch.com/docs/current.html#ref-indexer ... it dumps out the textual data during indexing, retrieved from mysql.

    It's not dumped from the index itself, and is not subject to all the 'magic' tokenizing and normalizing sphinx does (charset_table/wordforms etc)


    Back to indextool there is also the --fold, --htmlstrip, --morph, which can be used in stream to tokenize text.

    In theory could use these to use the 'power' of sphinx, and the settings from an actual index, to create a processed dataset (similar to what sphinx is doing generating index)