I know Elastic Search could do aggregation, but you know DB aggregation is not totally same as ES world.
Like we do below in DB:
select a, b, c, d, sum(e), sum(f)
from some_table nolock
group by a, b, c, d
It will return:
a, b, c, d, sum e, sum f
1, 2, 3, 4, 100 , 100
1, 2, 3, 5, 150 , 150
...
So each a+b+c+d as different key, will have one row return.
But how can I do it in ES aggregation? It will return nested result and if I aggregation with different sequence, like a->b->c->d result will be different with d->c->b->a result. But in DB, group by a,b,c,d will not different with group by d,c,b,a.
Any suggestion is welcome, Thanks
Have tried xpack and parallelly aggregation, xpack hard to use, parallelly not as expected
First of all there is really no difference between a->b->c->d and d->c->b->a result as long as you process the keys correctly, the only difference will be the order. Here is an example:
DELETE test
PUT test
{
"mappings": {
"properties": {
"a": {
"type": "integer"
},
"b": {
"type": "integer"
},
"c": {
"type": "integer"
},
"d": {
"type": "integer"
},
"e": {
"type": "integer"
},
"f": {
"type": "integer"
}
}
}
}
POST test/_bulk?refresh
{"index": {"_id": 1}}
{"a": 1, "b": 2, "c": 3, "d": 4, "e": 50, "f": 50 }
{"index": {"_id": 2}}
{"a": 1, "b": 2, "c": 3, "d": 4, "e": 50, "f": 50 }
{"index": {"_id": 3}}
{"a": 1, "b": 2, "c": 3, "d": 5, "e": 50, "f": 50 }
{"index": {"_id": 4}}
{"a": 1, "b": 2, "c": 3, "d": 5, "e": 50, "f": 50 }
{"index": {"_id": 5}}
{"a": 1, "b": 2, "c": 3, "d": 5, "e": 50, "f": 50 }
POST test/_search
{
"size": 0,
"aggs": {
"a": {
"terms": {
"field": "a"
},
"aggs": {
"b": {
"terms": {
"field": "b"
},
"aggs": {
"c": {
"terms": {
"field": "c"
},
"aggs": {
"d": {
"terms": {
"field": "d"
},
"aggs": {
"sum_e": {
"sum": {
"field": "e"
}
},
"sum_f": {
"sum": {
"field": "f"
}
}
}
}
}
}
}
}
}
}
}
}
POST test/_search
{
"size": 0,
"aggs": {
"d": {
"terms": {
"field": "d"
},
"aggs": {
"b": {
"terms": {
"field": "b"
},
"aggs": {
"c": {
"terms": {
"field": "c"
},
"aggs": {
"a": {
"terms": {
"field": "a"
},
"aggs": {
"sum_e": {
"sum": {
"field": "e"
}
},
"sum_f": {
"sum": {
"field": "f"
}
}
}
}
}
}
}
}
}
}
}
}
However, you might be more comfortable with composite
aggregations that will produce flatter results closer to what you expect from SQL.
POST test/_search
{
"size": 0,
"aggs": {
"my_groups": {
"composite": {
"sources": [
{
"a": {
"terms": {
"field": "a"
}
}
},
{
"b": {
"terms": {
"field": "b"
}
}
},
{
"c": {
"terms": {
"field": "c"
}
}
},
{
"d": {
"terms": {
"field": "d"
}
}
}
]
},
"aggs": {
"sum_e": {
"sum": {
"field": "e"
}
},
"sum_f": {
"sum": {
"field": "f"
}
}
}
}
}
}
But you can also allow Elasticsearch to write that aggregation for you and get exactly what you expected:
POST /_sql?format=csv
{
"query": "SELECT a, b, c, d, sum(e), sum(f) FROM test GROUP BY a, b, c, d",
"fetch_size": 10
}