Search code examples
elasticsearchaggregate-functionselasticsearch-aggregationelasticsearch-sql

SQL aggregation query corresponding in elasticsearch


I studied elasticsearch aggregation queries but couldn't find if it supports multiple aggregate function. In an other word, I wanna know if elasticsearch can generate the equivalent of this Sql aggregation query:

  SELECT account_no, transaction_type, count(account_no), sum(amount), max(amount) FROM index_name GROUP BY account_no, transaction_type Having count(account_no) > 10

If yes, how? Thank you.


Solution

  • There are two possible ways to do what you are looking for in ES and I've mentioned them both below.

    I've also added sample mapping and sample documents for your reference.

    Mapping:

    PUT index_name
    {
      "mappings": {
        "mydocs":{
          "properties":{
            "account_no":{
              "type": "keyword"
            },
            "transaction_type":{
              "type": "keyword"
            },
            "amount":{
              "type":"double"
            }
          }
        }
      }
    }
    

    Sample Documents:

    Notice carefully, I'm only creating list of 4 transactions for 1 customer.

    POST index_name/mydocs/1
    {
      "account_no": "1011",
      "transaction_type":"credit",
      "amount": 200
    }
    
    POST index_name/mydocs/2
    {
      "account_no": "1011",
      "transaction_type":"credit",
      "amount": 400
    }
    
    POST index_name/mydocs/3
    {
      "account_no": "1011",
      "transaction_type":"cheque",
      "amount": 100
    }
    
    POST index_name/mydocs/4
    {
      "account_no": "1011",
      "transaction_type":"cheque",
      "amount": 100
    }
    

    There are two ways to get what you are looking for:

    Solution 1: Using Elasticsearch Query DSL

    Aggregation Query:

    For Aggregation Query DSL, I've made use of the below aggregation queries to solve what you are looking for.

    Below is how query is summarised version of the query so that you get the clarity on which queries are sibling and which are parents.

    - Terms Aggregation (For Every Account)
      - Terms Aggregation (For Every Transaction_type)
        - Sum Amount 
        - Max Amount
    

    Below is the actual query:

    POST index_name/_search
    {
      "size": 0, 
      "aggs": {
        "account_no_agg": {
          "terms": {
            "field": "account_no"
          },
          "aggs": {
            "transaction_type_agg": {
              "terms": {
                "field": "transaction_type",
                "min_doc_count": 2
              },
              "aggs": {
                "sum_amount": {
                  "sum": {
                    "field": "amount"
                  }
                },
                "max_amount":{
                  "max": {
                    "field": "amount"
                  }
                }
              }
            }
          }
        }
      }
    }
    

    Important thing to mention is min_doc_count which is nothing but the having count(account_no)>10, which in my query I'm filtering only those transactions with having count(account_no) > 2

    Query Response

    {
      "took" : 5,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 4,
        "max_score" : 0.0,
        "hits" : [ ]
      },
      "aggregations" : {
        "account_no_agg" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "1011",                         <----  account_no
              "doc_count" : 4,                        <----  count(account_no)
              "transaction_type_agg" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "cheque",                 <---- transaction_type
                    "doc_count" : 2,
                    "sum_amount" : {                  <----  sum(amount)
                      "value" : 200.0
                    },
                    "max_amount" : {                  <----  max(amount)
                      "value" : 100.0
                    }
                  },
                  {
                    "key" : "credit",                 <---- another transaction_type
                    "doc_count" : 2,
                    "sum_amount" : {                  <---- sum(amount)
                      "value" : 600.0
                    },
                    "max_amount" : {                  <---- max(amount)
                      "value" : 400.0
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
    

    Notice the above result carefully, I've added comments wherever required so that it helps what part of sql query you are looking for.

    Solution 2: Using Elasticsearch SQL(_xpack solution)

    If you are making use of xpack feature of Elasticsearch's SQL Access, you can simply copy paste the SELECT Query as below for the mapping and document as mentioned above:

    Elasticsearch SQL:

    POST /_xpack/sql?format=txt
    {
      "query": "SELECT account_no, transaction_type, sum(amount), max(amount), count(account_no) FROM index_name GROUP BY account_no, transaction_type HAVING count(account_no) > 1"
    
    }
    

    Elasticsearch SQL Result:

      account_no   |transaction_type|  SUM(amount)  |  MAX(amount)  |COUNT(account_no)
    ---------------+----------------+---------------+---------------+-----------------
    1011           |cheque          |200.0          |100.0          |2                
    1011           |credit          |600.0          |400.0          |2                
    

    Note that I've tested the query in ES 6.5.4.

    Hope this helps!