Search code examples
elasticsearchaggregatepostgresql-9.1elasticsearch-aggregation

Can we add custom aggregated value in Elastic Search


I have three tables Customer, Product and Transactions. I have used Elastic Search to build index joining the three tables. The output of the index looks something like this

{
    "customer_id": 101,
    "name": "John Doe",
    "transactions":
    [
        {
            "product_id": 11,
            "product_name": "T-Shirt",
            "transaction_id": "TX101",
            "price": "500"
        },
        {
            "product_id": 11,
            "product_name": "T-Shirt",
            "transaction_id": "TX101",
            "price": "600"
        },
        {
            "product_id": 12,
            "product_name": "Shirt",
            "transaction_id": "TX102",
            "price": "1000"
        }
    ]
}

I would like a total_transaction field that contains sum of price from all the transactions.

Is it possible to have the field dynamically added while building the index or while retrieving the documents.

I tried the aggregates, but they work on the whole data. The aggregate functions doesnt work within each document.

I use Postgres database. If there is another way to achieve this, that would also be welcomed


Solution

  • It is possible to achieve this at indexing time using an ingest pipeline with a script processor:

    POST _ingest/pipeline/_simulate
    {
      "pipeline": {
        "processors": [
          {
            "script": {
              "description": "Sum transaction priceses",
              "lang": "painless",
              "source": """
                    ctx.total_transaction = ctx.transactions.stream().map(x -> Integer.parseInt(x.price)).reduce(0, (a, b) -> a + b);
                  """
            }
          }
        ]
      },
      "docs": [
        {
          "_source": {
            "customer_id": 101,
            "name": "John Doe",
            "transactions": [
              {
                "product_id": 11,
                "product_name": "T-Shirt",
                "transaction_id": "TX101",
                "price": "500"
              },
              {
                "product_id": 11,
                "product_name": "T-Shirt",
                "transaction_id": "TX101",
                "price": "600"
              },
              {
                "product_id": 12,
                "product_name": "Shirt",
                "transaction_id": "TX102",
                "price": "1000"
              }
            ]
          }
        }
      ]
    }
    

    Response:

    {
      "docs" : [
        {
          "doc" : {
            "_source" : {
              "total_transaction" : 2100,
              "name" : "John Doe",
              "customer_id" : 101,
              "transactions" : [
                {
                  "transaction_id" : "TX101",
                  "product_name" : "T-Shirt",
                  "price" : "500",
                  "product_id" : 11
                },
                {
                  "transaction_id" : "TX101",
                  "product_name" : "T-Shirt",
                  "price" : "600",
                  "product_id" : 11
                },
                {
                  "transaction_id" : "TX102",
                  "product_name" : "Shirt",
                  "price" : "1000",
                  "product_id" : 12
                }
              ]
            }
          }
        }
      ]
    }
    

    If transactions is NOT nested, you can also achieve this at search time using script fields, but it's going to be less performant because the script needs to execute for each matching document:

    GET test/_search
    {
      "_source": true,
      "query": {
        "match_all": {}
      },
      "script_fields": {
        "total_transaction": {
          "script": {
            "lang": "painless",
            "source": "doc['transactions.price'].stream().reduce(0, (a, b) -> a + b)"
          }
        }
      }
    }
    

    Response:

    {
      "hits" : {
        "hits" : [
          {
            "_source" : {
              "name" : "John Doe",
              "customer_id" : 101,
              "transactions" : [
                {
                  "transaction_id" : "TX101",
                  "product_name" : "T-Shirt",
                  "price" : 500,
                  "product_id" : 11
                },
                {
                  "transaction_id" : "TX101",
                  "product_name" : "T-Shirt",
                  "price" : 600,
                  "product_id" : 11
                },
                {
                  "transaction_id" : "TX102",
                  "product_name" : "Shirt",
                  "price" : 1000,
                  "product_id" : 12
                }
              ]
            },
            "fields" : {
              "total_transaction" : [
                2100
              ]
            }
          }
        ]
      }
    }