Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-6

Elasticsearch Pipelined search?


I've been using Elasticsearch for a while at my company and seems to have been working well so far for our searches. We've been seeing more complex use cases from our customers to need more "ad-hoc/advanced" query capabilities and inter-document relationships (or joins in the traditional sense). I understand that ES isn't built for joins and denormalisation is the recommended way. We have been denormalising the documents to support every use case so far and that in itself has become overly complex and expensive for us to do as our customers have to wait for a long time to get this code change rolled out.

We've been more often criticized by our business that "Hey your data model isn't right. It isn't suited for smarter queries". It's painfully harder for the team everytime to make them understand why denormalisation is required.

A few examples of the problems:

"Find me all the persons having the same birthdays"
"Find me all the persons travelling to the same cities within the same time frame"

Imagine every event document is a person record with their travel details.

So is there a concept of a pipeline search where I can break the search into multiple search queries and pass the output of one as an input to another? Or is there any other recommended way to solve these types of problems without having to boil the ocean?


Solution

  • The two queries above can be solved with aggregations.

    I'm assuming the following sample document/schema:

    {
      "firstName": "John",
      "lastName": "Doe",
      "birthDate": "1998-04-02",
      "travelDate": "2019-10-31",
      "city": "London"
    }
    

    The first one by aggregating with a terms on the birthdate field (day of the year) and min_doc_count: 2, e.g.:

    {
      "size": 0,
      "aggs": {
        "birthdays": {
          "terms": {
            "script": "return LocalDate.parse(params._source.birthDate).format(DateTimeFormatter.ofPattern('MM/dd'))",
            "min_doc_count": 2
          },
          "aggs": {
            "persons": {
              "top_hits": {}
            }
          }
        }
      }
    }
    

    The second one by aggregating with a terms aggregation on the city field and constrained with a range query on the travelDate field for the desired time frame:

    {
      "size": 0,
      "query": {
        "range": {
          "travelDate": {
            "gte": "2019-10-01",
            "lt": "2019-11-01"
          }
        }
      },
      "aggs": {
        "cities": {
          "terms": {
            "field": "city.keyword"
          },
          "aggs": {
            "persons": {
              "top_hits": {}
            }
          }
        }
      }
    }
    

    The second query can also be done with field collapsing:

    {
      "_source": false,
      "query": {
        "range": {
          "travelDate": {
            "gte": "2019-10-01",
            "lt": "2019-11-01"
          }
        }
      },
      "collapse": {
        "field": "city.keyword",
        "inner_hits": {
          "name": "people"
        }
      }
    }
    

    If you need both aggregations at the same time, it is definitely possible to do so:

    {
      "size": 0,
      "aggs": {
        "birthdays": {
          "terms": {
            "script": "return LocalDate.parse(params._source.birthDate).format(DateTimeFormatter.ofPattern('MM/dd'))",
            "min_doc_count": 2
          },
          "aggs": {
            "persons": {
              "top_hits": {}
            }
          }
        },
        "travels": {
          "filter": {
            "range": {
              "travelDate": {
                "gte": "2019-10-01",
                "lt": "2019-11-01"
              }
            }
          },
          "aggs": {
            "cities": {
              "terms": {
                "field": "city.keyword"
              },
              "aggs": {
                "persons": {
                  "top_hits": {}
                }
              }
            }
          }
        }
      }
    }