Situations:
I have enourmous docs having no unique key field, but with multiple fields it can be distinctive.
For example, search_term
field can be duplicated, but each search_term
in category1
-category2
-category3
is unique per day(report_date
).
For example,
my docs seems like:
{
"category1":"AD",
"category2":"GOOGLE",
"category3":"SEARCH",
...to much details...
"search_term":"SAMSUNG TV"
"report_date":20230919
}
I've tried:
my query below failed to get a total number of unique_keys (have no idea how to get that):
{
"_source": false,
"aggs": {
"unique_keys": {
"composite": {
"size": 2,
"sources": [
{ "search_term": { "terms": { "field": "search_term.keyword" } } },
{ "category1": { "terms": { "field": "category1" } } },
{ "category2": { "terms": { "field": "category2" } } },
{ "category3": { "terms": { "field": "category3" } } }
]
},
"aggs": {
"distinct_docs": {
"top_hits": {
"size": 1,
"_source": [
"search_term",
"category1",
"category2",
"category3"
],
"sort": [
{
"report_date": {"order": "desc"}
}
]
}
}
}
}
},
"size": 0,
"query": {
"bool": {
"minimum_should_match": "1",
"should": [
{
"match": {
"search_term": {
"operator": "and",
"query": "SAMSUNG TV"
}
}
}
]
}
}
}
What I want:
The function what I want is to search a word (or words) and get responses of all the case which categories search_term
belongs to. The aggregated information should be the most recent reported(report_date
) and also includes a number of the total unique key count.
what I need is like: (the format doesn't matter)
{
"total_count": 3,
"buckets": [
{
"key": {
"search_term": "SAMSUNG TV",
"category1": "AD",
"category2": "GOOGLE",
"category3": "SEARCH"
}
},
{
"key": {
"search_term": "SAMSUNG TV",
"category1": "AD",
"category2": "GOOGLE",
"category3": "DISPLAY"
}
},
{
"key": {
"search_term": "SAMSUNG TV 32",
"category1": "AD",
"category2": "FACEBOOK",
"category3": "DISPLAY"
}
}
]
}
There is no bucket_count
in OpenSearch and not able to use cardinality
according to multiple keys.
I deadly need any hints! Thanxxx!
This should serve your purpose.
{
"size": 0,
"query": {
"bool": {
"should": [
{
"match": {
"search_term": "SAMSUNG TV"
}
}
]
}
},
"aggs": {
"unique_categories": {
"composite": {
"size": 10000, // Adjust the size as needed, set it to a sufficiently large number
"sources": [
{ "search_term": { "terms": { "field": "search_term.keyword" } } },
{ "category1": { "terms": { "field": "category1.keyword" } } },
{ "category2": { "terms": { "field": "category2.keyword" } } },
{ "category3": { "terms": { "field": "category3.keyword" } } }
]
},
"aggs": {
"distinct_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"report_date": "desc"
}
],
"_source": false
}
}
}
}
}
}