I just set up an ELK stack for my apache logs. It's working great. Now I want to add maillogs to the mix, and I'm having trouble parsing the logs with grok.
I'm using this site to debug: https://grokdebug.herokuapp.com/
Here is an example maillog (sendmail) entry:
Apr 24 19:38:51 ip-10-0-1-204 sendmail[9489]: w3OJco1s009487: to=<username@domain.us>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=120318, relay=webmx.bglen.net. [10.0.3.231], dsn=2.0.0, stat=Sent (Ok: queued as E2DEF60724), w3OJco1s009487: to=<username@domain.us>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=120318, relay=webmx.[redacted].net. [10.0.3.231], dsn=2.0.0, stat=Sent (Ok: queued as E2DEF60724)
From the text above, I want to pull out the text to=<username@domain.us>
.
So far I have this for a grok pattern:
(?<mail_sent_to>[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*)
It gives me the result username@domain.us>
which is nice, but I want it to have the to=
on the front as well. And I only want this grok filter to match email addresses which have to=
in front of them.
I tried this, but it gives me "no matches" as a result:
(?<mail_sent_to>"to="[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*)
You may use
\b(?<mail_sent_to>to=<[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})*>)
or, since [a-zA-Z0-9_]
matches the same chars as \w
:
\b(?<mail_sent_to>to=<[\w.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})*>)
See the regex demo.
Details
\b
- a word boundary(?<mail_sent_to>
- "mail_sent_to" group:
to=<
- a literal string to=<
[\w.+=:-]+
- 1+ word, .
, +
, =
, :
or -
chars@
- a @
char[0-9A-Za-z]
- an alphanumeric char[0-9A-Za-z-]{0,62}
- 0 to 62 letters, digits or -
(?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})*
- 0+ sequences of
\.
- a dot[0-9A-Za-z]
- an alphanumeric char[0-9A-Za-z-]{0,62}
- 0 to 62 letters, digits or -
>
- a >
char)
- end of the group.