Search code examples
c#elasticsearchnest

Elastic nest query field should exist filter


I am using elasticsearch 5.2 with Nest client to query. I have a working query with date range looks like:

var boolQuery = new BoolQueryDescriptor<AttractionDocument>();

//https://github.com/elastic/elasticsearch-net/issues/2570 must is not additive, we cannot split out query as before it all has to be one big one

boolQuery.Must(
    mn => AddRegionQuery(permissions, mn),
    mn => AddOffersQuery(permissions, mn),
    mn => request.AddDateFilter ? mn.DateRange(d => d.Field(f => f.AvailableFrom).LessThanOrEquals(DateTime.Now)) : mn,
    mn => request.AddDateFilter ? mn.DateRange(d => d.Field(f => f.AvailableTo).GreaterThanOrEquals(DateTime.Now)) : mn,
    mn => AddGenresQuery(genres, mn)
);

The issue I have is AvailableTo is not always populated and hence it will for some documents not exist.

I tried to add the following:

if (request.AddDateFilter)
{
    boolQuery.MustNot(mn => mn.Exists(f => f.Field(e => e.AvailableTo)));
}

The issue now is query becomes too restrictive. Ideally the exists part I want as a Should? What I am trying to achieve is only apply date range for AvailableTo if we have that field else ignore and return results without that date. If I take out the AvailableTo part I do get results.


Solution

  • You should be able to combine an exists query with a range query on AvailableTo to include documents where the AvailableTo field exists and must be satisfy the range condition, and create a disjunction with AvailableTo exists query in a bool query must_not clause i.e. invert the exists.

    Something like the following (I've commented out queries that aren't provided)

    var client = new ElasticClient(settings);
    
    var request = new 
    {
        AddDateFilter = true
    };
    
    var boolQuery = new BoolQueryDescriptor<AttractionDocument>();
    
    boolQuery.Must(
        // mn => AddRegionQuery(permissions, mn),
        // mn => AddOffersQuery(permissions, mn),
        mn => request.AddDateFilter ? mn.DateRange(d => d.Field(f => f.AvailableFrom).LessThanOrEquals(DateTime.Now)) : mn,
        mn => request.AddDateFilter ? (mn.Exists(d => d.Field(f => f.AvailableTo)) &&
                                      mn.DateRange(d => d.Field(f => f.AvailableTo).GreaterThanOrEquals(DateTime.Now))) ||
                                      !mn.Exists(d => d.Field(f => f.AvailableTo)) : mn //,
        // mn => AddGenresQuery(genres, mn)
    );
    
    client.Search<AttractionDocument>(s => s
        .Query(q => q.Bool(b => boolQuery))
    );
    

    This produces the following query

    {
      "query": {
        "bool": {
          "must": [
            {
              "range": {
                "availableFrom": {
                  "lte": "2018-11-15T20:18:10.528482+10:00"
                }
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "must": [
                        {
                          "exists": {
                            "field": "availableTo"
                          }
                        },
                        {
                          "range": {
                            "availableTo": {
                              "gte": "2018-11-15T20:18:10.5304815+10:00"
                            }
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "availableTo"
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
    

    Since the range and exist queries are predicates (a document either matches the condition or it doesn't) as opposed to queries that should calculate relevancy scores, these could be bool query filter clauses

    boolQuery.Must(
        // Uncomment below queries, or add (QueryContainer[])null to run
        // mn => AddRegionQuery(permissions, mn),
        // mn => AddOffersQuery(permissions, mn),
        // mn => AddGenresQuery(genres, mn)
    ).Filter(
        mn => request.AddDateFilter ? mn.DateRange(d => d.Field(f => f.AvailableFrom).LessThanOrEquals(DateTime.Now)) : mn,
        mn => request.AddDateFilter ? (+mn.Exists(d => d.Field(f => f.AvailableTo)) &&
                                      +mn.DateRange(d => d.Field(f => f.AvailableTo).GreaterThanOrEquals(DateTime.Now))) ||
                                      !mn.Exists(d => d.Field(f => f.AvailableTo)) : mn    
    );
    
    client.Search<AttractionDocument>(s => s
        .Query(q => q.Bool(b => boolQuery))
    );
    

    which creates the query

    {
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "availableFrom": {
                  "lte": "2018-11-15T20:22:25.4556963+10:00"
                }
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "exists": {
                            "field": "availableTo"
                          }
                        },
                        {
                          "range": {
                            "availableTo": {
                              "gte": "2018-11-15T20:22:25.4587138+10:00"
                            }
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "availableTo"
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
    

    Operator overloading on queries really helps here, to write complex bool queries more succinctly