Search code examples
elasticsearchelastic-stackelasticsearch-5elasticsearch-aggregation

Elastic Search Aggregation with Filter unable to filter aggregation


Hello We are working on a project and we are stuck at this if someone can help that would be really Great

GET xyxz/_search
{
   "size":0,
   "aggs":{
      "company":{
         "terms":{
            "field":"skills.name.keyword",
            "size":10
         }
      }
   },
   "query":{
      "bool":{
         "must":[

         ],
         "filter":[

         ],
         "should":[
            {
               "wildcard":{
                  "skills.name":{
                     "value":"jav*"
                  }
               }
            }
         ],
         "must_not":[

         ]
      }
   }
}

NEW UPDATED QUERY

                    POST INDEX/_search
                    {
                        "size": 0,
                        "aggs": {
                            "my_terms": {
                                "terms": {
                                    "script": {
                                        "inline": """
                                if(doc['skills.name.keyword'].size()>0)
                                {                   
                                    if(doc['skills.name.keyword'].value.contains("jav"))
                                    {
                                      return doc['skills.name.keyword'];
                                    }
                                }
                              """
                                    },
                                    "size": 10
                                }
                            }
                        }
                    }

SAMPLE RESPONSE

            {
                "took" : 7469,
                "timed_out" : false,
                "_shards" : {
                    "total" : 1,
                    "successful" : 1,
                    "skipped" : 0,
                    "failed" : 0
                },
                "hits" : {
                    "total" : {
                        "value" : 10000,
                        "relation" : "gte"
                    },
                    "max_score" : null,
                    "hits" : [ ]
                },
                "aggregations" : {
                    "my_terms" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 871,
                        "buckets" : [
                            {
                                "key" : "java",
                                "doc_count" : 121
                            },
                            {
                                "key" : "javascript",
                                "doc_count" : 77
                            },
                            {
                                "key" : "sql",
                                "doc_count" : 62
                            },
                            {
                                "key" : "core java",
                                "doc_count" : 46
                            },
                            {
                                "key" : "xml",
                                "doc_count" : 43
                            },
                            {
                                "key" : "software development",
                                "doc_count" : 36
                            },
                            {
                                "key" : "requirements analysis",
                                "doc_count" : 34
                            },
                            {
                                "key" : "microsoft sql server",
                                "doc_count" : 31
                            },
                            {
                                "key" : "java enterprise edition",
                                "doc_count" : 30
                            },
                            {
                                "key" : "jquery",
                                "doc_count" : 27
                            }
                        ]
                    }
                }
            }

Message : I would like to say a big Thanks for helping me out we have been communicating through stack overflow since several weeks. Thanks once again to the stack overflow community


Solution

  • Solution: Aggregation Result:

    Post receiving your mapping, below is what you are looking for where I've made use of Scripted Terms Aggregation:

    POST <your_index_name>/_search
    {
      "size": 0,
      "aggs": {
        "my_terms": {
          "terms": {
            "script": {
              "inline": """
                if(doc['skills.name.keyword'].size()>0){                    <---- Note this logic I've added 
                    if(doc['skills.name.keyword'].value.contains("jav")){
                      return doc['skills.name.keyword'];
                    }
                }
              """
            }, 
            "size": 10
          }
        }
      }
    }
    

    Note that I've made use of contains method of String class for Java. You can change the logic according to what you are looking for so that only aggregation values that you'd want could be filtered out.

    The reason you may have to filter aggregation response is due to the fact that your sample document can have multiple skills like below sample:

    {
      "skills": [
        {
          "name": "java"
        },
        {
          "name": "javascript"
        },
        {
          "name": "c++"
        }
        ]
    }
    

    Note that your solution is of object datatype.

    Query result would return the entire document, aggregation query would then run on top of these results.

    So as you can see, the above document also has c++ and that it would also be included in the aggregation query. The only way to achieve the aggregation on the hits is to make use of scripted logic which I've mentioned.

    Autocompete Question:

    Second question is autocomplete feature and for that you would need to read a bit about it as there are various ways to achieve it.

    However, I'd suggest you to start with understanding Analysis phase of Elasticsearch, understand what an Analyzer is and its various parts that constitute Analyzer, finally moving on to reading about Edge n-grams tokenizer and Completion Suggester.

    It will take a while to grasp all these concepts but once you get the hang of it, its relatively easy to have it implemented.

    Note that wildcard queries are not something I'd recommend. Once you understand and get to know about Ngram or Edge Ngram tokenizers, your query could be as simple as a simple match query for jav. But do read about the concepts mentioned in the links.

    Let me know if this helps and if you want any further clarification.