regex logstash elastic-stack logstash-grok

Using grok to match custom style email address

I just set up an ELK stack for my apache logs. It's working great. Now I want to add maillogs to the mix, and I'm having trouble parsing the logs with grok.

I'm using this site to debug: https://grokdebug.herokuapp.com/

Here is an example maillog (sendmail) entry:

Apr 24 19:38:51 ip-10-0-1-204 sendmail[9489]: w3OJco1s009487: to=<username@domain.us>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=120318, relay=webmx.bglen.net. [10.0.3.231], dsn=2.0.0, stat=Sent (Ok: queued as E2DEF60724), w3OJco1s009487: to=<username@domain.us>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=120318, relay=webmx.[redacted].net. [10.0.3.231], dsn=2.0.0, stat=Sent (Ok: queued as E2DEF60724)

From the text above, I want to pull out the text to=<username@domain.us>.

So far I have this for a grok pattern:

(?<mail_sent_to>[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.(?:[0-9A-Za-z][0-‌9A-Za-z-]{0,62}))*)

It gives me the result username@domain.us> which is nice, but I want it to have the to= on the front as well. And I only want this grok filter to match email addresses which have to= in front of them.

I tried this, but it gives me "no matches" as a result:

(?<mail_sent_to>"to="[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.(?:[0-9A-Za-z][0-‌9A-Za-z-]{0,62}))*)

Solution

You may use

\b(?<mail_sent_to>to=<[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})*>)

or, since [a-zA-Z0-9_] matches the same chars as \w:

\b(?<mail_sent_to>to=<[\w.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})*>)

See the regex demo.

Details

\b - a word boundary
(?<mail_sent_to> - "mail_sent_to" group:
- to=< - a literal string to=<
- [\w.+=:-]+ - 1+ word, ., +, =, : or - chars
- @ - a @ char
- [0-9A-Za-z] - an alphanumeric char
- [0-9A-Za-z-]{0,62} - 0 to 62 letters, digits or -
- (?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})* - 0+ sequences of
  - \. - a dot
  - [0-9A-Za-z] - an alphanumeric char
  - [0-9A-Za-z-]{0,62} - 0 to 62 letters, digits or -
- > - a > char
) - end of the group.