Currently SQL '%like%' search is used to get all the rows which contains certain keywords. we're trying to replace MySQL like search with Lucene-Solr.
We constructed indexes,
and it got slower. damn!
I suppose that bandwidth used in 1, 2, 3 is the cause (since the result is really huge, like 1 million+), but I cannot figure any better ways.
Is there any other ways to get solr search result except CSV over http? (like file dump in mysql)
We did the same procedure to combine solr and mysql which was 100-1000x faster than single mySql fulltext search .
So your workflow/procedure is not a problem in general. The question is: where is your bottleneck. To investigate that, you should take a look to the catalina out to see the query time of each solr request. Same on MySQL - take a look to query-time/long running queries.
We had an performance problem because the returned number of PK was very large -> so the mySQL query was very large because of an very long where in ()
clause.
Followed by an very large MySQL statement there where lots of rows returned 200-1.000.000+
But the point is, that the application/user does not need such a big date at onces. So we decided to work with pagination and offset (on solr side). Solr now returns only 30-50 results (depending of the pagination setting of the users application environment).
This works very fast.
//Edit: Is there any other ways to get solr search result except CSV over http?
There are different formats, like XML, PHP, CSV, Python, Ruby and JSON. To change this, you can use the wt
parameter, like ....&wt=json
http://wiki.apache.org/solr/CoreQueryParameters#wt
http://wiki.apache.org/solr/QueryResponseWriter
//Edit #2
An additional way could be not only indexing the data to solr. You could (additional) store the data to solr in order to fetch the data from solr and live without MySQL data. it depends on your data, if that is an way for you...