Let say three are these three documents and need to write an elasticsearch query which gets an item name parameter and returns next items(calculate by using order) of it with the occurrence.
itemArray is defined as nestedObject, but not necessary to be nested. I'm lost in the documentation a bit. Any help will be appreciated.
Data Example:
doc-1
{
"id" : 0
"itemArray": [
{
"name":"X",
"order" : 0
},
{
"name":"Y",
"order" : 1
},
{
"name":"Z",
"order" : 2
}
]
}
doc-2
{
"id" : 1
"itemArray": [
{
"name":"X",
"order" : 0
},
{
"name":"Y",
"order" : 1
},
{
"name":"T",
"order" : 2
}
]
}
doc-3
{
"id" : 2
"itemArray": [
{
"name":"X",
"order" : 0
},
{
"name":"Y",
"order" : 1
},
{
"name":"Z",
"order" : 2
}
]
}
Response Example for the input "X", There are three document contain Y; after X in its array according to order:
{
"Y": 3
}
Response Example for the input "Y" There are two document contain Z and one document contain T; after Y in its array according to order:
{
"Z": 2,
"T": 1
}
ElasticSearch version: 6.2
It is pretty feasible if you consider denormalizing your data a little bit.
Consider that your mapping would look like this:
PUT nextval
{
"mappings": {
"item": {
"properties": {
"id": {
"type": "long"
},
"itemArray": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"nextName": {
"type": "keyword"
}
}
}
}
}
}
}
Here we store in a nested
next value of the array explicitly. Now let's insert the data:
POST nextval/item/0
{
"id" : 0,
"itemArray": [
{
"name":"X",
"nextName":"Y"
},
{
"name":"Y",
"nextName":"Z"
},
{
"name":"Z"
}
]
}
POST nextval/item/1
{
"id" : 1,
"itemArray": [
{
"name":"X",
"nextName":"Y"
},
{
"name":"Y",
"nextName":"T"
},
{
"name":"T"
}
]
}
POST nextval/item/2
{
"id" : 2,
"itemArray": [
{
"name":"X",
"nextName":"Y"
},
{
"name":"Y",
"nextName":"Z"
},
{
"name":"Z"
}
]
}
And use a query like this to obtain the result for the input X
:
POST nextval/item/_search
{
"query": {
"nested": {
"path": "itemArray",
"query": {
"term": {
"itemArray.name": "X"
}
}
}
},
"aggs": {
"1. setup nested": {
"nested": {
"path": "itemArray"
},
"aggs": {
"2. filter agg results": {
"filter": {
"term": {
"itemArray.name": "X"
}
},
"aggs": {
"3. aggregate by nextName": {
"terms": {
"field": "itemArray.nextName"
}
}
}
}
}
}
}
}
The output will look like this:
{
...,
"aggregations": {
"1. setup nested": {
"doc_count": 9,
"2. filter agg results": {
"doc_count": 3,
"3. aggregate by nextName": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Y",
"doc_count": 3
}
]
}
}
}
}
}
If we do the query for the input Y
the output will be:
{
...,
"aggregations": {
"1. setup nested": {
"doc_count": 9,
"2. filter agg results": {
"doc_count": 3,
"3. aggregate by nextName": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Z",
"doc_count": 2
},
{
"key": "T",
"doc_count": 1
}
]
}
}
}
}
}
One important thing to know about nested objects is:
each nested object is indexed as a hidden separate document
I recommend reading this page of the Guide, they provide great explanation and examples.
Since these objects are separate, we lose the information about their position in the array. This is the reason you put order
there in the first place.
That's why we put the nextName
field in the nested object: so the object itself knows which is its neighbor.
Let's recap. In our query there are basically 4 essential points:
itemArray.name==X
nested
filter
terms
The 1) is pretty obvious: we only want documents that correspond our request. The 2) is also straightforward: since itemArray
is a nested
, we can only do aggregations within nested
context.
The 3) one is tricky. Let's return to the output of the query:
{
...,
"aggregations": {
"1. setup nested": {
"doc_count": 9,
"2. filter agg results": {
"doc_count": 3,
"3. aggregate by nextName": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Z",
"doc_count": 2
},
{
"key": "T",
"doc_count": 1
}
]
}
}
}
}
}
The doc_count
of the first aggregation is 9. Why 9? Because this is the amount of nested objects we have in the documents matched our search query.
This is why we need the 3) aggregation: from all items select only those that have itemArray.name==X
.
And the 4) one is again simple: just count how many times each term of field itemArray.nextName
is met.
Probably, yes. It depends on your data and on your needs and how free are you to change the mapping. For instance, if you are just exploring your data, the potential of scripted aggregations is huge.
Hope that helps!