Search code examples
c#elasticsearchmatchnestmatch-phrase

Nest ElasticSearch MatchPhrase Behavior When Phrase Not Found


Using an ElasticSearch Nuget to search Kibana logs in a given field called Message.

What I Want to Do

Verify Text Not There

I want to search for the "Don't Find This" sub-string inside the Message field. I want to verify the string is absent.

Verify Text Present

I want to search for the sub-string "Here it is" inside same field. I verify it is present.

What Worked

The second test was successful using the Match() function.

What Did Not Work

The first test failed. It returned 10 records because, it seems, they contained "Don't" or "Find" or "This".

A quick search suggested that I would have to change indexing or create a custom analyzer, but I don't understand the repercussions of this, i.e., what effect will it have on the currently working test?

In the code below is where I tried MatchPhrase instead of Match.

I was surprised to see that using MatchPhrase caused the originally working test to fail.

            ISearchResponse<LogRecord> searchResponse = await _elasticClient.SearchAsync<LogRecord>(s => s
                .AllIndices()
                .Query(q => q
                    .Bool(b => b
                        .Must(m =>
                        {
                            var mustClauses = new List<Func<QueryContainerDescriptor<LogRecord>, QueryContainer>>();

                            if (!string.IsNullOrEmpty(message))
                                mustClauses.Add(mc => mc.Match(m => m.Field(f => f.Message).Query(message)));
                           
// a list of other fields here...

                            mustClauses.Add(mc => mc.DateRange(dr => dr.Field(f => f.Time).GreaterThanOrEquals(startDate ?? DateTime.MinValue).LessThanOrEquals(endDate ?? DateTime.Now)));

                            return m.Bool(b1 => b1.Must(mustClauses));
                        })
                        )
                    )
                );
            return searchResponse;

I have a SpecFlow feature file where I specify what to verify as present or absent as follows:

    Then log has
        | Message                         |
        | some raw request before sending |
        | Start Blah Blah transaction      |  
        | sg_blahblah                      |
        | sg_Year                          |
        | sg_EmployeeNumber                |
    And log does not have
        | Message         | 
        | this_works_fine | 
        | this_works_too  |
        | No good         |


Solution

  • Not quite sure about your DSL need.

    If I understand correctly, it should be something like this:

    POST _search
    {
      "query":{ 
        "bool": {
          "must": [
            {
              "match_phrase":{
                "my_message_field": "Message"
              }
            },
            {
              "match_phrase":{
                "my_message_field": "some raw request before sending"
              }
            },
            {
              "match_phrase":{
                "my_message_field": "Start Blah Blah transaction"
              }
            },
            {
              "match_phrase":{
                "my_message_field": "sg_blahblah"
              }
            },
            {
              "match_phrase":{
                "my_message_field": "sg_Year"
              }
            },
            {
              "match_phrase":{
                "my_message_field": "sg_EmployeeNumber"
              }
            }
          ],
          "must_not": [
            {
              "match_phrase":{
                "my_message_field": "Message"
              }
            },
            {
              "match_phrase":{
                "my_message_field": "this_works_too"
              }
            },
            {
              "match_phrase":{
                "my_message_field": "this_works_fine"
              }
            },
            {
              "match_phrase":{
                "my_message_field": "No good"
              }
            }
          ]
        }
      }
    }
    

    You can also use filter to boost performance if the keyword only has 1 word; or add slop to adjust granularity (see: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html).

    I wrote all in match_phrase because it seems there would be a lot of keywords with spaces in your case.


    And then append those match_phrase clauses to your Clauses.