Search code examples
c#sql-serverazuresphinxazure-table-storage

Using Sphinx Search With Azure Table Storage


I have Sphinx SE running against a ms sql server currently and it has worked great for the past few years. The table sphinx used has recently expanded a lot and we need to leverage the speed provided by moving the table to an azure table storage.

What options do I have to allow sphinx to index this table from azure? I know it supports ms sql, but the azure table storage offering is a different beast. I also have found that Sphinx supports an xml input, but it would be very hard to export all of this data into a file to be read every 5 minutes. Has anyone conquered this issue using Azure Table Storage?

thanks


Solution

  • Well XMLpipe (or even TSVpipe) would be the way to to connect to the table-store. Lacking a native SQL based driver.

    ... but yes a simple implementation might well load all data. Which is actully what you possibly doing with MS-SQL. It's just the data is actully small enough that its reasonable practical.

    Loading all data on MS-SQL would be similally "expensive"

    So really your question is more how to index a 'large' dataset. Some sort of incremental update system, so you only need to load the 'changes. (The fact that using against a Storage Table, kind of then becomes just a trivial detail of the implementation)

    One concept might see quite a bit in Sphinx is so called 'main'+'delta' http://www.sphinxconsultant.com/sphinx-search-delta-indexing/

    That works quite well with XMLpipe too. So can work with Asure. You just need to come up with a couple of scripts, one to download large quantity of data (to initially commission the 'main', it doesnt get used often)

    ... then a second script to only get the new records. Run some sort of query

    You just need somesort of script to stream from Azure, and output itehr XML or TSV https://www.google.com/search?q=Azure+Table+Storage+stream