I have many values in one field, when I do an aggregations, I receive these values as separate values.
Exemple :
name : jess , Region : new york
name : jess , Region : poland
request :
query = {
"size": total,
"aggs": {
"buckets_for_name": {
"terms": {
"field": "name",
"size": total
},
"aggs": {
"region_terms": {
"terms": {
"field": "region",
"size": total
}
}
}
}
}
}
with response["aggregations"]["buckets_for_name"]["buckets"]
i get :
{'key': 'jess ', 'doc_count': 61, 'region_terms': {'doc_count_error_upper_bound': 0, 'sum_other_doc_count': 0, 'buckets': [{'key': 'oran', 'doc_count': 60}, {'key': 'new ', 'doc_count': 1}, {'key': 'york', 'doc_count': 1}]}}, {'key': 'jess ', 'doc_count': 50, 'egion_terms': {'doc_count_error_upper_bound': 0, 'sum_other_doc_count': 0, 'buckets': [{'key': 'poland', 'doc_count': 50}]}}
with
pretty_results = []
for result in response["aggregations"]["buckets_for_name"]["buckets"]:
d = dict()
d["name"] = result["key"]
d["region"] = []
for region in result["region_terms"]["buckets"]:
d["region "].append(region ["key"])
pretty_results.append(d)
print(d)
i get :
{'name': 'jess ', 'region ': ['new' , 'york', 'poland']}
I want to get this result:
{'name': 'jess ', 'region ': ['new york', 'poland']}
The region
(and I presume name
) fields were analyzed using the standard analyzer which rendered new york
to be split into the tokens [new
, york
].
What you may want to do is set up a keyword
mapping to treat the strings as standalone tokens:
PUT regions
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"region": {
"type": "text",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
After that, perform your aggs on the .keyword
fields:
{
"size": 200,
"aggs": {
"buckets_for_name": {
"terms": {
"field": "name.keyword", <---
"size": 200
},
"aggs": {
"region_terms": {
"terms": {
"field": "region.keyword", <---
"size": 200
}
}
}
}
}
}
If you want to keep newyork
space-less, look into the pattern_replace
filter within your analyzers.
EDIT from the comments Aggs are not a part of the query -- they have their own scope -- so change this
{
"query": {
"aggs": {
"buckets_for_name": {
to this
{
"query": {
// possibly leave the whole query attribute out
},
"aggs": {
"buckets_for_name": {
...