I am looking to extract multiple instances of the same field from a single log line. For example, suppose I had the following log record:
Recipients: alice@somedomain.com bob@someotherdomain.com carl@carlsplace.org
I don't know in advance how many email addresses will be listed.
Related to this, in some earlier work, I processed log records that looked like this:
Step=12305, Step=11006, Step=11001, Step=11018, Step=12304, Step=11522, Step=11806
In that case, I took advantage of the kv{}
filter, which automatically produced a nice, multi-valued field like this:
"Step": [
"12305",
"11006",
"11001",
"11018",
"12304",
"11522",
"11806"
],
I would like to get the same kind of multi-valued field as my result, but cannot simply use kv again because the actual log lines are messier than my original example. The actual log lines are more like this:
Recipients: Unwanted_text alice@somedomain.com other junk bob@someotherdomain.com some.hostname.net 1 carl@carlsplace.org even-more
I would like a grok expression that would capture N-number of email addresses (%{EMAILADDRESS}
), wherever they are in the log line, and put them into a multi-valued field. Can someone suggest how to do this?
Thanks,
Chris
input{
beats{
port => #specify_your_port_here
}
}
filter{
mutate{
gsub => [
"message","([a-zA-Z][a-zA-Z0-9_.+-=:]+@\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))","email=\1"
]
}
kv{
source => "message"
}
}
output{
elasticsearch{
host => "localhost:9200"
index => "manual"
document_type => "log"
}
}
I tested the above configuration file in which filebeat reads the input log from a file and sends it to logstash.
Explanation:->
I used gsub
to replace all the occurrences of email-addresses in the input message
with email=
and the captured email address.
The regex used here is nothing but the regex used for email address in grok, I just added a capture group in order to capture the email address.
Then I used email address to extract the email addresses.
Eg:->
input message ->
Recipients: Recipients: Unwanted_text alice@somedomain.com other junk bob@someotherdomain.com some.hostname.net 1 carl@carlsplace.org even-more
gsub converts the input message to :->
Recipients: Unwanted_text email=alice@somedomain.com other junk email=bob@someotherdomain.com some.hostname.net 1 email=carl@carlsplace.org even-more
and then the kv filter creates an array 'email' which contains all the email addresses
"email": [
"alice@somedomain.com",
"bob@someotherdomain.com",
"carl@carlsplace.org"
]