I am using elasticsearch-dsl in django. And I have a DocType document defined and a keyword containing a list of values.
Here is my code for the same.
from elasticsearch_dsl import DocType, Text, Keyword
class ProductIndex(DocType):
"""
Index for products
"""
id = Keyword()
slug = Keyword()
name = Text()
filter_list = Keyword()
filter_list is the array here which contains multiple values. Now I have some values say sample_filter_list which are the distinct values from and some of these elements can be present in some product's filter_list array. So given this sample_filter_list, I want all the unique elements of filter_list of all the products whose filter_list intersection with sample_filter_list in not null.
for example I have 5 products whose filter_list is like :
1) ['a', 'b', 'c']
2) ['d', 'e', 'f']
3) ['g', 'h', 'i']
4) ['j', 'k', 'l']
5) ['m', 'n', 'o']
and if my sample filter_list is ['a', 'd', 'g', 'j', 'm']
then elasticsearch should return an array containg distinct element
i.e. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']
Writing Answer not specific to django but general,
Suppose you have some ES index some_index2 with mapping
PUT some_index2
{
"mappings": {
"some_type": {
"dynamic_templates": [
{
"strings": {
"mapping": {
"type": "string"
},
"match_mapping_type": "string"
}
}
],
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}
}
Also you have inserted the documents
{
"field1":"id1",
"field2":["a","b","c","d]
}
{
"field1":"id2",
"field2":["e","f","g"]
}
{
"field1":"id3",
"field2":["e","l","k"]
}
Now as you stated you want all the distinct values of field2(filter_list) in your case, You can easily get that by using ElasticSearch term aggregation
GET some_index2/_search
{
"aggs": {
"some_name": {
"terms": {
"field": "field2",
"size": 10000
}
}
},
"size": 0
}
Which will give you result as:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"some_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "e",
"doc_count": 2
},
{
"key": "a",
"doc_count": 1
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
},
{
"key": "d",
"doc_count": 1
},
{
"key": "f",
"doc_count": 1
},
{
"key": "g",
"doc_count": 1
},
{
"key": "k",
"doc_count": 1
},
{
"key": "l",
"doc_count": 1
}
]
}
}
}
where buckets contains the list of all the distinct values.
you can easily iterate through bucket and find the value under KEY.
Hope this is what is required to you.