I have Sphinx SE running against a ms sql server currently and it has worked great for the past few years. The table sphinx used has recently expanded a lot and we need to leverage the speed provided by moving the table to an azure table storage.
What options do I have to allow sphinx to index this table from azure? I know it supports ms sql, but the azure table storage offering is a different beast. I also have found that Sphinx supports an xml input, but it would be very hard to export all of this data into a file to be read every 5 minutes. Has anyone conquered this issue using Azure Table Storage?
thanks
Well XMLpipe (or even TSVpipe) would be the way to to connect to the table-store. Lacking a native SQL based driver.
... but yes a simple implementation might well load all data. Which is actully what you possibly doing with MS-SQL. It's just the data is actully small enough that its reasonable practical.
Loading all data on MS-SQL would be similally "expensive"
So really your question is more how to index a 'large' dataset. Some sort of incremental update system, so you only need to load the 'changes. (The fact that using against a Storage Table, kind of then becomes just a trivial detail of the implementation)
One concept might see quite a bit in Sphinx is so called 'main'+'delta' http://www.sphinxconsultant.com/sphinx-search-delta-indexing/
That works quite well with XMLpipe too. So can work with Asure. You just need to come up with a couple of scripts, one to download large quantity of data (to initially commission the 'main', it doesnt get used often)
... then a second script to only get the new records. Run some sort of query
You just need somesort of script to stream from Azure, and output itehr XML or TSV https://www.google.com/search?q=Azure+Table+Storage+stream