Search code examples
pythonregexsyslog

Python regex parsing of syslog


I have a syslog file with this format.

Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: Application Version: 8.44.0
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: Run on system: host
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: Running as user: SYSTEM
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: User has admin rights: yes
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: Start Time: 2016-03-07 13:44:55
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: IP Address: 10.10.10.10
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: CPU Count: 1
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: System Type: Server
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: System Uptime: 18.10 days
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: MODULE: InitHead MESSAGE: => Reading signature and hash files ...
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Notice: MODULE: Init MESSAGE: file-type-signatures.cfg initialized with 80 values.
Mar  7 13:44:56 host.domain.example.net/10.10.10.10 Application: Notice: MODULE: Init MESSAGE: signatures/filename-characteristics.dat initialized with 2778 values.
Mar  7 13:44:56 host.domain.example.net/10.10.10.10 Application: Notice: MODULE: Init MESSAGE: signatures/keywords.dat initialized with 63 values.
Some logs ...
Mar  7 17:42:08 host.domain.example.net/10.10.10.10 Application: Results: MODULE: Report MESSAGE: Results: 0 Alarms, 0 Warnings, 131 Notices, 2 Errors
Mar  7 17:42:08 host.domain.example.net/10.10.10.10 Application: End: MODULE: Report MESSAGE: Begin Time: 2016-03-07 13:44:55
Mar  7 17:42:08 host.domain.example.net/10.10.10.10 Application: End: MODULE: Report MESSAGE: End Time: 2016-03-07 17:42:07
Mar  7 17:42:08 host.domain.example.net/10.10.10.10 Application: End: MODULE: Report MESSAGE: Scan took 3 hours 57 mins 11 secs

How to extract the "Application Version", "Run on system", "User has admin rights", "Start Time", "IP Address", "CPU Count", "System Type", "System Uptime", "End Time", and count of "Alarms", "Warnings", "Notices", "Errors" using Python?

Actually I am new to Python so really I don't know how to do it. but I managed to make a function named finder()

def finder(fname,str):
    with open(fname, "r") as hand:
        for line in hand:
            line = line.rstrip()
            if re.search(str, line):
              return line

and to get the line with IP address I will call it with

 finder("file path","MESSAGE: IP Address")

This will print full line, I need help to get only that ipaddress part, and rest of other information in other lines as well.


Solution

  • Please check below links before going through code. It will help you greatly.

    1. re module - The module used. This link given has great explanation along with examples
    2. Python Regex Tester - Here you can test your regex and the regex related functions available with Python. I have used the same to test the regex I have used below :

    Code with Comments inline

    import re
    fo = open("out.txt", "r")
    #The information we need to collect.
    info_list =["Application Version", "Run on system", "User has admin rights", "Start Time", "IP Address", "CPU Count", "System Type", "System Uptime", "End Time", "Results","Begin Time"]
    for line in fo:
        for srch_pat in info_list:
            #First will search if the inforamtion we need is present in line or not.
            if srch_pat in line:
                #This will get the exact information. For e.g, version number in case of Application Version
                regex = re.compile(r'MESSAGE:\s+%s:\s+(.*)'%srch_pat)
                m = regex.search(line)
    
                if "Results" in srch_pat:
                    #For result, this regex will get the required info
                    result_regex = re.search(r'(\d+)\s+Alarms,\s+(\d+)\s+Warnings,\s+(\d+)\s+Notices,\s+(\d+)\s+Errors',m.group(1))
                    print 'Alarms - ',result_regex.group(1)
                    print 'Warnings - ',result_regex.group(2)
                    print 'Notices - ',result_regex.group(3)
                    print 'Errors - ',result_regex.group(4)
                else:
                    print srch_pat,'-',m.group(1)
    

    Output

    C:\Users\dinesh_pundkar\Desktop>python a.py
    Application Version - 8.44.0
    Run on system - host
    User has admin rights - yes
    Start Time - 2016-03-07 13:44:55
    IP Address - 10.10.10.10
    CPU Count - 1
    System Type - Server
    System Uptime - 18.10 days
    Alarms -  0
    Warnings -  0
    Notices -  131
    Errors -  2
    Begin Time - 2016-03-07 13:44:55
    End Time - 2016-03-07 17:42:07