Search code examples
regexlogstashelastic-stacklogstash-grok

Using grok to match custom style email address


I just set up an ELK stack for my apache logs. It's working great. Now I want to add maillogs to the mix, and I'm having trouble parsing the logs with grok.

I'm using this site to debug: https://grokdebug.herokuapp.com/

Here is an example maillog (sendmail) entry:

Apr 24 19:38:51 ip-10-0-1-204 sendmail[9489]: w3OJco1s009487: to=<[email protected]>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=120318, relay=webmx.bglen.net. [10.0.3.231], dsn=2.0.0, stat=Sent (Ok: queued as E2DEF60724), w3OJco1s009487: to=<[email protected]>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=120318, relay=webmx.[redacted].net. [10.0.3.231], dsn=2.0.0, stat=Sent (Ok: queued as E2DEF60724)

From the text above, I want to pull out the text to=<[email protected]>.

So far I have this for a grok pattern:

(?<mail_sent_to>[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.(?:[0-9A-Za-z][0-‌​9A-Za-z-]{0,62}))*)

It gives me the result [email protected]> which is nice, but I want it to have the to= on the front as well. And I only want this grok filter to match email addresses which have to= in front of them.

I tried this, but it gives me "no matches" as a result:

(?<mail_sent_to>"to="[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.(?:[0-9A-Za-z][0-‌​9A-Za-z-]{0,62}))*)

Solution

  • You may use

    \b(?<mail_sent_to>to=<[a-zA-Z0-9_.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})*>)
    

    or, since [a-zA-Z0-9_] matches the same chars as \w:

    \b(?<mail_sent_to>to=<[\w.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})*>)
    

    See the regex demo.

    Details

    • \b - a word boundary
    • (?<mail_sent_to> - "mail_sent_to" group:
      • to=< - a literal string to=<
      • [\w.+=:-]+ - 1+ word, ., +, =, : or - chars
      • @ - a @ char
      • [0-9A-Za-z] - an alphanumeric char
      • [0-9A-Za-z-]{0,62} - 0 to 62 letters, digits or -
      • (?:\.[0-9A-Za-z][0-9A-Za-z-]{0,62})* - 0+ sequences of
        • \. - a dot
        • [0-9A-Za-z] - an alphanumeric char
        • [0-9A-Za-z-]{0,62} - 0 to 62 letters, digits or -
      • > - a > char
    • ) - end of the group.