Search code examples
regexlogstashlogstash-grok

Extract multiple instances of field data from a single log line into a multi-valued field


I am looking to extract multiple instances of the same field from a single log line. For example, suppose I had the following log record:

Recipients: alice@somedomain.com bob@someotherdomain.com carl@carlsplace.org

I don't know in advance how many email addresses will be listed.

Related to this, in some earlier work, I processed log records that looked like this:

Step=12305, Step=11006, Step=11001, Step=11018, Step=12304, Step=11522, Step=11806

In that case, I took advantage of the kv{} filter, which automatically produced a nice, multi-valued field like this:

"Step": [
      "12305",
      "11006",
      "11001",
      "11018",
      "12304",
      "11522",
      "11806"
    ],

I would like to get the same kind of multi-valued field as my result, but cannot simply use kv again because the actual log lines are messier than my original example. The actual log lines are more like this:

Recipients: Unwanted_text alice@somedomain.com other junk bob@someotherdomain.com some.hostname.net 1 carl@carlsplace.org even-more

I would like a grok expression that would capture N-number of email addresses (%{EMAILADDRESS}), wherever they are in the log line, and put them into a multi-valued field. Can someone suggest how to do this?

Thanks,

Chris


Solution

  • input{
        beats{
            port => #specify_your_port_here
        }
    }
    
    filter{
        mutate{
            gsub => [
                "message","([a-zA-Z][a-zA-Z0-9_.+-=:]+@\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))","email=\1" 
            ]
        }
    
        kv{
            source => "message"
        }
    }
    
    output{
        elasticsearch{
            host => "localhost:9200"
            index => "manual"
            document_type => "log"
        }
    }
    

    I tested the above configuration file in which filebeat reads the input log from a file and sends it to logstash.

    Explanation:->

    1. I used gsub to replace all the occurrences of email-addresses in the input message with email= and the captured email address.

    2. The regex used here is nothing but the regex used for email address in grok, I just added a capture group in order to capture the email address.

    3. Then I used email address to extract the email addresses.

    Eg:->

    input message ->

    Recipients: Recipients: Unwanted_text alice@somedomain.com other junk bob@someotherdomain.com some.hostname.net 1 carl@carlsplace.org even-more

    gsub converts the input message to :->

    Recipients: Unwanted_text email=alice@somedomain.com other junk email=bob@someotherdomain.com some.hostname.net 1 email=carl@carlsplace.org even-more

    and then the kv filter creates an array 'email' which contains all the email addresses

    "email": [
        "alice@somedomain.com",
        "bob@someotherdomain.com",
        "carl@carlsplace.org"
    ]