Search code examples
c#elasticsearchnest

What Analyzer should be used for a field with forward slash in ElasticSearch?


I'm building a search feature that allows users to filter by certain field. One of the field is DOI. Filtering by DOI should only return results with exact match.

I'm trying to index the following object:

public class PaperElasticSearchModel
    {
        public string Title { get; set; }
        public string Abstract { get; set; }
        public string Keywords { get; set; }
        public string DOI { get; set; }
        public int Year { get; set; }
        public string[] Author { get; set; }
    }

The field DOI is will have a similar format to this: 10.1523/JNEUROSCI.18-04-01622.1998.

Below is the setting for the index

Client.Indices.Create(index, pub => pub
.Settings(s => s
 .Analysis(descriptor => descriptor
     .Analyzers(analyzer => analyzer
         .Custom("custom_analyzer", ca => ca
             .Filters("lowercase", "stop", "classic", "word_delimiter")
             )
         )
         
    .TokenFilters(bases => bases
         .EdgeNGram("custom_analyzer", td => td
         .MinGram(2)
         .MaxGram(25))
         )
    )
 .Setting(UpdatableIndexSettings.MaxNGramDiff, 23)
 )
.Map<PubScreenSearch>(m => m
 .AutoMap()
 .Properties(p => p
     .Text(t => t
         .Name(f => f.Author)
         .Analyzer("custom_analyzer"))
     .Text(t => t
         .Name(f => f.Keywords)
         .Analyzer("custom_analyzer"))
     .Text(t => t
         .Name(f => f.Title)
         .Analyzer("custom_analyzer"))
     .Keyword(t => t
         .Name(f => f.DOI)
     )
     
 )));

What Analyzer should I use such that if I were to search by DOI it would return for exact match?

I tried using the Pattern analyzer but it did not work:

.Text(t => t    
   .Name(f => f.DOI)
   .Analyzer(new PatternAnalyzer().Pattern = @"(.(\\/|\\?)*)\\w+")

Solution:

filterQuery.Add(fq => fq
            .Bool(f => f
                .Must(boolSHould => boolSHould
                    .Term(t => t
                        .Field(feild => feild.DOI.Suffix("keyword"))
                        .Value(value))
                        ))
            
        );

Solution

  • Let me clarify elasticsearch analyze process and how it works for term and match query during search and keyword and text field types during indexing.

    Official explanations: Match query : Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching. Term query: Returns documents that contain an exact term in a provided field.

    1. Elasticsearch analyzer eg. full text search, only possible with text field type. If you want to see the results for JNEUROSCI query in the 10.1523/JNEUROSCI.18-04-01622.1998 data use match + text field type
    2. If you are looking for an exact match use term query + keyword field type.

    Note: As name declared exact match looks for exact match so it's case sensitive.

    Here are some examples:

    #index the test data
    PUT test-analyzer/_doc/1
    {
      "field_name": "10.1523/JNEUROSCI.18-04-01622.1998"
    }
    

    #term query with text field type - no hits
    GET test-analyzer/_search
    {
      "query": {
        "term": {
          "field_name": "JNEUROSCI"
        }
      }
    }
    

    #term query with keyword field type - no hits
    GET test-analyzer/_search
    {
      "query": {
        "term": {
          "field_name.keyword": "JNEUROSCI"
        }
      }
    }
    

    #match query with text field type - there is a hit
    GET test-analyzer/_search
    {
      "query": {
        "match": {
          "field_name": "JNEUROSCI"
        }
      }
    }
    

    #match query with keyword field type - no hits
    GET test-analyzer/_search
    {
      "query": {
        "match": {
          "field_name.keyword": "JNEUROSCI"
        }
      }
    }
    

    #exact match that you're looking for - term query with keyword field type - there is a hit
    GET test-analyzer/_search
    {
      "query": {
        "term": {
          "field_name.keyword": "10.1523/JNEUROSCI.18-04-01622.1998"
        }
      }
    }