Search code examples
lucenesolrfuzzy-search

maximum chars in Solr/lucene term for fuzzy match


I am trying to experiment fuzzy match with Solr.

In my document indexed first_name field I mentioned as "MYNEWORGANIZATION20SEP2011" - actually the word was "My New Organization 20-Sep-2011" but I removed spaces and other chars.

Now above word (without spaces) if I search directly as query "MYNEWORGANIZATION20SEP2011" Solr is resulting 1 result as above document ID, perfect !

But if I trim two chars from this string and in query if I provide "MYNEWORGANIZATION20SEP20~0.8", I am getting 0 results.

for my new query MYNEWORGANIZATION20SEP20 distance with main document data is 2 - thus % match should be > 90% match, thus it should still search the data (in my query I am specifying only 80% match.

BTW, if I use first_name as 6-7 chars like "rushik" and provide fuzzy query like "rushik~0.75", search is working properly and returning the data.

In both the above cases I am using field type as "text_general" - using solr 3.3.

Do we have any chars limitation for fuzzy search in Solr or it can be configurable anywhere ? - I am using default solr configuration, not changed anything in solrconfig.xml

Is there any better way to search "My New Organization 20-Sep-2011" like string with fuzzy query without manually removing spaces.

Thanks, Rushik.


Solution

  • Whats the index time analysis done on your field ?
    The text general field usually goes through the white space tokenizer, stopword filter, word delimiter and lower case filter, in which case you indexed field is completely different.
    Is the conversion from My New Organization 20-Sep-2011 -> MYNEWORGANIZATION20SEP2011 done by you before indexing?
    Also, most important Fuzzy searches don't undergo query time analysis.

    You may want to use the field type as string or lowercase case fieldtype e.g.

        <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
          <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
          </analyzer>
        </fieldType>
    

    and test Query using lower case.