Search code examples
phpelasticsearchelastica

Elastica not grouping aggregation before setScript


I have a type product in elasticsearch that contains a column containing multiple ids, some of them are the same. There are also columns containing current price and the quantity. I want the to get the sum of price * qte of each unique id.

id    price    qte
__    _____    _________

1    25        4
1    25        4
1    25        4
2    38        2
2    38        2
3    12        3
3    12        3
3    12        3
3    12        3
4    33        6
5    64        8
5    64        8

(if you're wondering why it's like that, it's cuz there are other columns with different values for each, also note that each unique id has a unique price and a unique quantity)

So I created my aggregation :

$id = new \Elastica\Aggregation\Terms('id');
$id->setField('id')->setSize(0);

$qte_price = new \Elastica\Aggregation\Sum('qte_price');
$qte_price->setScript('doc["price"].value * doc["qte"].value');

$id->addAggregation($qte_price);

The problem here, is that qte_price doesn't use the first aggregation on the id before performing the setScript(), thus summing the total of price * qte on all ids even the repeated ones.

In other words I want to calculate for id=1 => 25*4 (and not (25*4)*3, for id=2 => 38*2 (and not (38*2)*2) .. etc

I found a trick to overcome this problem is by dividing the answer by the doc_count, but I'm looking for an official way of doing this using Elastica.


Solution

  • I would use the avg aggregation for both the price and qte fields and then use a bucket_script pipeline aggregation to multiply the average price by the average quantity and that would do what you expect.

    The average of 3*25 is 25 and the average of 3*4 is 4, then you can multiply 25 by 4 and you get your answer for id = 1... The same goes for the other ids.

    In pure DSL, it'd look like this:

    {
      "size": 0,
      "aggs": {
        "by_id": {
          "terms": {
            "field": "id"
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            },
            "avg_qte": {
              "avg": {
                "field": "qte"
              }
            },
            "price_by_qte": {
              "bucket_script": {
                "buckets_path": {
                  "avgPrice": "avg_price",
                  "avgQte": "avg_qte"
                },
                "script": "params.avgPrice * params.avgQte"
              }
            }
          }
        }
      }
    }
    

    Expressed in Elastica, it would go like this:

    $id = new \Elastica\Aggregation\Terms('id');
    $id->setField('id')->setSize(0);
    
    $avg_price = new \Elastica\Aggregation\Avg('avg_price');
    $avg_price->setField('price');
    $id->addAggregation($avg_price);
    
    $avg_qte = new \Elastica\Aggregation\Avg('avg_qte');
    $avg_qte->setField('qte');
    $id->addAggregation($avg_qte);
    
    $bucketScriptAggregation = new BucketScript(
        'price_by_qte',
        [
            'avgPrice' => 'avg_price',
            'avgQte' => 'avg_qte',
        ],
        'params.avgPrice * params.avgQte'
    );
    $id->addAggregation($bucketScriptAggregation);