Search code examples
pythonjsonregexsyslog

Python parse log using regex


Hope someone might be able to help. I have a log been sent from a syslog server to python which looks like this:

{'Raw': 'Nov 26 00:23:07 TEST 23856434232342 (2016-11-26T00:23:07) http-proxy[2063]: Allow 1-Trusted 0-External tcp 192.168.0.1 2.3.4.5 57405 80 msg="HTTP Request" proxy_act="HTTP-TEST" op="POST" dstname="www.google.com" arg="/" sent_bytes="351" rcvd_bytes="1400"  (HTTP-proxy-TEST-00)'}

I need to be able to extract the IP address, dstname=, sent_bytes= and dcvd_bytes= and if possible parse to json. I started trying to use REGEX (["'])(?:(?=(\\?))\2.)*?\1 to match the double quotes but its not working correctly.

Any ideas how I might get the data I need? Or how to parse the above to json?

Thanks


Solution

  • Assuming IP, dstname sent_bytes and rcvd_bytes are always in order, use re.findall to get them all

    import re
    s = r"""{'Raw': 'Nov 26 00:23:07 TEST 23856434232342 (2016-11-26T00:23:07) http-proxy[2063]: Allow 1-Trusted 0-External tcp 192.168.0.1 2.3.4.5 57405 80 msg="HTTP Request" proxy_act="HTTP-TEST" op="POST" dstname="www.google.com" arg="/" sent_bytes="351" rcvd_bytes="1400" (HTTP-proxy-TEST-00)'}"""
    
    match = re.findall('(?:tcp |dstname=|sent_bytes=|rcvd_bytes=)"?([^\s"]+)', s)
    # match = ['192.168.0.1', 'www.google.com', '351', '1400']
    (ip, dstname, sent_bytes, rcvd_bytes) = match
    # use this to parse to json