Search code examples
elasticsearchelasticsearch-aggregationopensearchamazon-opensearch

To get a number of total aggregated key count in AWS OpenSearch


Situations:

I have enourmous docs having no unique key field, but with multiple fields it can be distinctive. For example, search_term field can be duplicated, but each search_term in category1-category2-category3 is unique per day(report_date).

For example,

my docs seems like:

{
  "category1":"AD",
  "category2":"GOOGLE",
  "category3":"SEARCH",

  ...to much details...

  "search_term":"SAMSUNG TV"
  "report_date":20230919
}

I've tried:

my query below failed to get a total number of unique_keys (have no idea how to get that):

{
  "_source": false,
  "aggs": {
    "unique_keys": {
      "composite": {
        "size": 2, 
        "sources": [
          { "search_term": { "terms": { "field": "search_term.keyword" } } },
          { "category1": { "terms": { "field": "category1" } } },
          { "category2": { "terms": { "field": "category2" } } },
          { "category3": { "terms": { "field": "category3" } } }
        ]
      },
      "aggs": {
        "distinct_docs": {
          "top_hits": {
            "size": 1,
            "_source": [
              "search_term",
              "category1",
              "category2",
              "category3"
            ], 
            "sort": [
              {
                "report_date": {"order": "desc"}
              }
            ]
          }
        }
      }
    }
  },
  "size": 0, 
  "query": {
    "bool": {
      "minimum_should_match": "1",
      "should": [
        {
          "match": {
            "search_term": {
              "operator": "and",
              "query": "SAMSUNG TV"
            }
          }
        }
      ]
    }
  }
}

What I want:

The function what I want is to search a word (or words) and get responses of all the case which categories search_term belongs to. The aggregated information should be the most recent reported(report_date) and also includes a number of the total unique key count.

what I need is like: (the format doesn't matter)

{
  "total_count": 3,
  "buckets": [
    {
      "key": {
        "search_term": "SAMSUNG TV",
        "category1": "AD",
        "category2": "GOOGLE",
        "category3": "SEARCH"
      }
    },
    {
      "key": {
        "search_term": "SAMSUNG TV",
        "category1": "AD",
        "category2": "GOOGLE",
        "category3": "DISPLAY"
      }
    },
    {
      "key": {
        "search_term": "SAMSUNG TV 32",
        "category1": "AD",
        "category2": "FACEBOOK",
        "category3": "DISPLAY"
      }
    }
  ]
}

There is no bucket_count in OpenSearch and not able to use cardinality according to multiple keys.

I deadly need any hints! Thanxxx!


Solution

  • This should serve your purpose.

    {
      "size": 0,
      "query": {
        "bool": {
          "should": [
            {
              "match": {
                "search_term": "SAMSUNG TV"
              }
            }
          ]
        }
      },
      "aggs": {
        "unique_categories": {
          "composite": {
            "size": 10000,  // Adjust the size as needed, set it to a sufficiently large number
            "sources": [
              { "search_term": { "terms": { "field": "search_term.keyword" } } },
              { "category1": { "terms": { "field": "category1.keyword" } } },
              { "category2": { "terms": { "field": "category2.keyword" } } },
              { "category3": { "terms": { "field": "category3.keyword" } } }
            ]
          },
          "aggs": {
            "distinct_docs": {
              "top_hits": {
                "size": 1,
                "sort": [
                  {
                    "report_date": "desc"
                  }
                ],
                "_source": false
              }
            }
          }
        }
      }
    }