Search code examples
hibernate-searchhibernate-search-6

Customize EdgeNGramFilter minGramSize and maxGramSize in Hibernate Search 6.1.8 Final with Lucene backend


I am trying to implement autocomplete inspired by the Search analyzer section in this Hibernate Search 6.0.0.Beta2 release

This is the example from the above link that I am trying to follow.

@Entity
@Indexed
public class Book {

    @Id
    private Long id;

    @FullTextField(
            name = "title_autocomplete",
            analyzer = "autocomplete",
            searchAnalyzer = "autocomplete_query"
    )
    private String title;

    // ... getters and setters ...
}

To define an analyzer named "autocomplete" and a search analyzer named "autocomplete_query", I followed the 10.6.4 Custom analyzers and normalizers and defined the the following custom lucene analysis configurer and create a new persistence.xml.

public class CustomLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {

    @Override
    public void configure(LuceneAnalysisConfigurationContext context) {
        context.analyzer("autocomplete").custom()
            .tokenizer(StandardTokenizerFactory.class)
            .charFilter(HTMLStripCharFilterFactory.class)
            .tokenFilter(LowerCaseFilterFactory.class)
            .param("language", "English")
            .tokenFilter( ASCIIFoldingFilterFactory.class)
            .tokenFilter(EdgeNGramFilterFactory.class);

        context.analyzer("autocomplete_query").custom()
            .tokenizer(StandardTokenizerFactory.class)
            .charFilter(HTMLStripCharFilterFactory.class)
            .tokenFilter(LowerCaseFilterFactory.class)
            .param("language", "English")
            .tokenFilter(ASCIIFoldingFilterFactory.class);
    }
}
<property name="hibernate.search.backend.analysis.configurer"
  value="class:net.ad.mc.lucene_search.CustomLuceneAnalysisConfigurer"/>

My question is : is there a way to set the minGramSize and maxGramSize using the above method? I've gone through the official documentation but found no information on how to do this.


Solution

  • This can be done similarly to how you have the language parameter specified for lower case filter. tokenFilter() returns a DSL step exposing a parameter method through which you can pass any filter-related parameters:

    public class CustomLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {
    
        @Override
        public void configure(LuceneAnalysisConfigurationContext context) {
            context.analyzer("autocomplete").custom()
                .tokenizer(StandardTokenizerFactory.class)
                .charFilter(HTMLStripCharFilterFactory.class)
                .tokenFilter(LowerCaseFilterFactory.class)
                        .param("language", "English")
                .tokenFilter( ASCIIFoldingFilterFactory.class)
                .tokenFilter( EdgeNGramFilterFactory.class )
                        .param( "minGramSize", "3" )
                        .param( "maxGramSize", "7" );
    
            context.analyzer("autocomplete_query").custom()
                .tokenizer(StandardTokenizerFactory.class)
                .charFilter(HTMLStripCharFilterFactory.class)
                .tokenFilter(LowerCaseFilterFactory.class)
                       .param("language", "English")
                .tokenFilter(ASCIIFoldingFilterFactory.class);
        }
    }
    

    In case you are unsure about parameter name strings - open a filter class implementation and look for a constructor accepting a map - it will have the parameter names in it.