Search code examples
elasticsearchfiltergroup-byelasticsearch-aggregation

Elasticsearch terms aggregation and querying


I have two types of log messages:

Jul 23 09:24:16 rrr mrr-core[222]: Aweg3AOMTs_1563866656871111.mt processMTMessage() #12798 realtime: 5.684 ms

Jul 23 09:24:18 rrr mrr-core[2222]: Aweg3AOMTs_1563866656871111.0.dn processDN() #7750 realtime: 1.382 ms

The first message is kind of sent message and second is message which confirm that message was delivered.

The difference between them is the suffix which I have separated from "id" and can query it.

These messages are parsed and stored in elasticsearch in following format:

messageId: Aweg3AOMTs_1563866656871111.0.dn
text: Aweg3AOMTs
num1: 1563866656871111
num2: 0
suffix: mt/dn

I would like to find out which messages were succesfully delivered and which weren't. I am very beginner in elasticsearch so I'm really struggling.

I'm trying terms aggregations at the moment but all I could've achieved is this code:

GET /my_index3/_search
{
  "size": 0,
  "aggs": {
    "num1": {
      "terms": {
        "field": "messageId.keyword",
        "include": ".*mt*."
      }
    }
  } 
}

Which shows me the sent messages. I don't know how to add some filter there or clause that could show me only messages having both mt and dn suffix.

If anyone has an idea I'd be really thankful :))


Solution

  • Running the terms aggregation on messageId.keyword is not that good, as each message is different ('Aweg3AOMTs_1563866656871111.0.dn' is not the same as 'Aweg3AOMTs_1563866656871111.mt').

    From looking at the docs structure, I think you better run the terms aggregation on num1 which is the common part of the .mt and .dn messages. That aggregation will give you the count of messages for each unique num1. So for each message which got a request & response the count would be 2, a message with only request would have a count of 1.

    If you also want to see the number itself, you can add a nested aggregation inside, like top-hits aggregation with size 1, that would display the num1 field inside:

    GET /my_index3/_search {
    "size": 0,
    "aggs": {
        "num1": {
            "terms": {
                "field": "num1",
                "order": {
                    "_count": "desc"
                },
                "aggs": {
                    "count_of_distinct_suffix": {
                        "cardinality": {
                            "field": "suffix"
                        },
                        "aggs": {
                            "filter_count_is_2": {
                                "bucket_selector": {
                                    "buckets_path": {
                                        "the_doc_count": "_count"
                                    },
                                    "script": "the_doc_count == 2"
                                }
                            }
                        }
                    }
                }
              }
           }
        }
    }