I have database table containing ~30 GB of data. I am indexing it with DIH. Indexing data takes only 1 hr 15 minutes but search is very slow it takes around 1 minute which doesn't seem to be right. Please help, if someone has faced the same issue.
I am proving the content of files.
data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://Battrdbtest20/test_results"
batchSize="-1"
user="results"
password="resultsloader"/>
<document>
<entity name="Syndrome"
pk="test_file_result_id"
query="SELECT * FROM Syndrome">
<Field column="test_file_result_id" name="test_file_result_id"/>
<Field column="syndrome" name="syndrome"/>
</entity>
</document>
</dataConfig>
schema.xml (Changed only fields to suit my data)
<fields>
<field name="test_file_result_id" type="slong" indexed="true" stored="true" required="true" omitNorms="true" multivalued="false" />
<field name="syndrome" type="string" indexed="true" stored="true" required="true" omitNorms="false" multivalued="false" />
</fields>
<uniqueKey>test_file_result_id</uniqueKey>
<defaultSearchField>syndrome</defaultSearchField>
NO CHANGE IN solrconfig.xml
test_file_result_id is id of 10 digits. And syndrome field stores blob which contain huge data )kind of log file content).
I would like to mention that when i search by test_file_result_id, search results comes up within a second but for syndrome, it take more than a minute.
Thanks in advance!!
I am assuming that string
is defined as solr.StrField
in your schema.xml
.
Since you are having a blob of data, it would possibly be useful to use a field type that has the right set of tokenizers, analyzers and filters.
For example, adding a StandardTokenizerFactory keeps tokens to a meaningful value set.
An example of the fieldtype definition:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldtype>
You could try something like this and that should make a difference to the response time.