Search code examples
elasticsearchlucenekibanaamazon-elasticsearch

Match Substring email address at specific location in ELK


I am trying to find out data matching emails from a message field in ELK Kibana discover section, I am getting the results using:

@message:"abc@email.com"

However, the results produced contains some other messages where email should not be matched, I am unable to build solution for this.

Results are(data has been sanitized for security reasons):

@message:[INF] [2020-07-07 12:54:51.105] [PID-1] : [abcdefg] [JID-5c] [data] LIST_LOOKUP: abc@email.com | User List from Profiles | name | user_name @id:355502086986714

@message:[INF] [2020-07-07 12:38:36.755] [PID-2] : [abcdefg] [JID-ed2] [data] LIST_LOOKUP: abc@email.com | User List from Profiles | name | user_name @id:355501869671304

@message:[INF] [2020-07-07 12:19:48.141] [PID-3] [abc@email.com] : [c5] [data] Completed 200 OK in 11ms @id:355501617979964834

@message:[INF] [2020-07-07 11:19:48.930] [PID-5] [abc@email.com] : [542] [data] Completed 200 OK in 9ms @id:35550081535

while I want it to be:

@message:[INF] [2020-07-07 12:19:48.141] [PID-3] [abc@email.com] : [c5] [data] Completed 200 OK in 11ms @id:355501617979964834

@message:[INF] [2020-07-07 11:19:48.930] [PID-5] [abc@email.com] : [542] [data] Completed 200 OK in 9ms @id:35550081535

I've tried using @message: "[PID-*] [abc@email.com]",@message: "\[PID-*\] \[abc@email.com\] \:", @message: "[abc@email.com]", @message: *abc@email.com* and some more similar searches to no success.

Please let me know what I am missing here and how to make efficient subtext searches in ELK kibana using discover and KQL/Lucene.

Here is the mapping for my index(I am getting data from cloudwatch logs):

{
   "cwl-*":{
      "mappings":{
         "properties":{
            "@id":{
               "type":"string"
            },
            "@log_stream":{
               "type":"string"
            },
            "@log_group":{
               "type":"string"
            },
            "@message":{
               "type":"string"
            },
            "@owner":{
               "type":"string"
            },
            "@timestamp":{
               "type":"date"
            }
         }
      }
   }
}

Solution

  • As @Gibbs already mentioned the cause all your data contains the string abc@email.com and by seeing your mapping now its confirmed that your are using the string field without explicit analyzer will uses the default standard analyzer

    Instead of this you should map your field which gets the mail id to custom analyzer which uses the UAX URL Email tokenizer which doesn't split the text.

    Example on how to create this analyzer with example

    Mapping with custom email analyzer

    {
        "settings": {
            "analysis": {
                "analyzer": {
                    "email_analyzer": {
                        "tokenizer": "my_tokenizer"
                    }
                },
                "tokenizer": {
                    "my_tokenizer": {
                        "type": "uax_url_email"
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "email": {
                    "type": "text",
                    "analyzer": "email_analyzer"
                }
            }
        }
    }
    

    Analyze api response

    POST http://{{hostname}}:{{port}}/{{index-name}}/_analyze

    {
        "analyzer": "email_analyzer",
        "text": "abc@email.com"
    }
    
    
    {
        "tokens": [
            {
                "token": "abc@email.com",
                "start_offset": 0,
                "end_offset": 13,
                "type": "<EMAIL>",
                "position": 0
            }
        ]
    }