Search code examples
elasticsearchsolrlucenelucene.net

Lucene | How to find prefix matches at beginning of field?


I want to match prefixes near the start of a field. I have this, but it's not matching the prefix; it only matches the whole word if the search term matches it. It seems like there's no way to combine SpanTermQuery and PrefixQuery.

        var nameTerm = new Term("name", searchTerm);

        var prefixName = new PrefixQuery(nameTerm);

        var prefixAtStart = new BooleanQuery
        {
            { prefixName, Occur.MUST },
            {  new SpanFirstQuery(new SpanTermQuery(nameTerm), 0), Occur.MUST }
        };

For example:

  • Search term: "Comp"
  • Want to find: "Computer science class" and "Comp Sci"
  • Only finding: "Comp Sci"
  • Don't want to find: "Apple's latest computer"

Can the RegexpQuery be made to understand positions?


Solution

  • When you only want to match prefixes, you can do it by having below field type for your field.

    <analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    

    then in this case the query would be like :

    field:comp*
    

    Now you have a second for which you need NGramFilter, so you can use the below field type for your field.

    <field name="text_prefix" type="text_prefix" indexed="true" stored="false"/>
    
    <fieldType name="text_prefix" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.LowerCaseTokenizerFactory"/>
                <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.LowerCaseTokenizerFactory"/>
            </analyzer>
        </fieldType>