Extract multiple instances of field data from a single log line into a multi-valued field

I am looking to extract multiple instances of the same field from a single log line. For example, suppose I had the following log record:

Recipients: alice@somedomain.com bob@someotherdomain.com carl@carlsplace.org

I don't know in advance how many email addresses will be listed.

Related to this, in some earlier work, I processed log records that looked like this:

Step=12305, Step=11006, Step=11001, Step=11018, Step=12304, Step=11522, Step=11806

In that case, I took advantage of the kv{} filter, which automatically produced a nice, multi-valued field like this:

"Step": [
      "12305",
      "11006",
      "11001",
      "11018",
      "12304",
      "11522",
      "11806"
    ],

I would like to get the same kind of multi-valued field as my result, but cannot simply use kv again because the actual log lines are messier than my original example. The actual log lines are more like this:

Recipients: Unwanted_text alice@somedomain.com other junk bob@someotherdomain.com some.hostname.net 1 carl@carlsplace.org even-more

I would like a grok expression that would capture N-number of email addresses (%{EMAILADDRESS}), wherever they are in the log line, and put them into a multi-valued field. Can someone suggest how to do this?

Thanks,

Chris

Solution

input{
    beats{
        port => #specify_your_port_here
    }
}

filter{
    mutate{
        gsub => [
            "message","([a-zA-Z][a-zA-Z0-9_.+-=:]+@\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))","email=\1" 
        ]
    }

    kv{
        source => "message"
    }
}

output{
    elasticsearch{
        host => "localhost:9200"
        index => "manual"
        document_type => "log"
    }
}

I tested the above configuration file in which filebeat reads the input log from a file and sends it to logstash.

Explanation:->

I used gsub to replace all the occurrences of email-addresses in the input message with email= and the captured email address.
The regex used here is nothing but the regex used for email address in grok, I just added a capture group in order to capture the email address.
Then I used email address to extract the email addresses.

Eg:->

input message ->

Recipients: Recipients: Unwanted_text alice@somedomain.com other junk bob@someotherdomain.com some.hostname.net 1 carl@carlsplace.org even-more

gsub converts the input message to :->

Recipients: Unwanted_text email=alice@somedomain.com other junk email=bob@someotherdomain.com some.hostname.net 1 email=carl@carlsplace.org even-more

and then the kv filter creates an array 'email' which contains all the email addresses

"email": [
    "alice@somedomain.com",
    "bob@someotherdomain.com",
    "carl@carlsplace.org"
]