Search code examples
pythonlogginggeneratortail

python monitor a log file non blocking


I have a test stub that will write several log messages to the system log.

But, this system log gets updated by many other applications as well. So, basically, I want to do a tail -f system.log | grep "application name" to get only the appropriate log messages.

I was looking at dbaez generator tricks, and I am trying to combine the both http://www.dabeaz.com/generators/follow.py and http://www.dabeaz.com/generators/apachelog.py

So, in my __main__(), I have something like this:

try:
   dosomeprocessing()     #outputs stuff to the log file

And within dosomeprocessing(), I run a loop, and for each loop, I want to see if there are any new log messages caused by my application, and not necessarily print it out, but store them somewhere to do some validation.

    logfile = open("/var/adm/messages","r")
    loglines = follow(logfile)
    logpats = r'I2G(JV)'
    logpat = re.compile(logpats)
    groups = (logpat.match(line) for line in loglines)
    for g in groups:
        if g:
            print g.groups()

The log looks something like :

Feb  4 12:55:27 Someprocessname.py I2G(JV)-300[20448]: [ID 702911   local2.error] [MSG-70047] xxxxxxxxxxxxxxxxxxxxxxx 
 Feb  4 12:55:27 Someprocessname.py I2G(JV)-300[20448]: [ID 702911  local2.error] [MSG-70055] xxxxxxxxxxxxxxxxxxxxxxx

in addition to a lot of other gobblygook.

Right now, it gets stuck in the for g in groups:

I am relatively new to python and asynchronous programming. Ideally, I would like to be able to have the tail running parallely to the main process, and read new data with each loop.

Please let me know if I need to add more information.


Solution

  • I suggest you use either watchdog or pyinotify to monitor changes to your log file.

    Also, I would suggest to remember last position you read from. After you get IN_MODIFY notification, you could read from last position to the end of file and apply your loop again. Also, reset last position to 0 when it is bigger than size of file in case file was truncated.

    Here is example:

    import pyinotify
    import re
    import os
    
    
    wm = pyinotify.WatchManager()
    mask = pyinotify.IN_MODIFY
    
    
    class EventHandler (pyinotify.ProcessEvent):
    
        def __init__(self, file_path, *args, **kwargs):
            super(EventHandler, self).__init__(*args, **kwargs)
            self.file_path = file_path
            self._last_position = 0
            logpats = r'I2G\(JV\)'
            self._logpat = re.compile(logpats)
    
        def process_IN_MODIFY(self, event):
            print "File changed: ", event.pathname
            if self._last_position > os.path.getsize(self.file_path):
                self._last_position = 0
            with open(self.file_path) as f:
                f.seek(self._last_position)
                loglines = f.readlines()
                self._last_position = f.tell()
                groups = (self._logpat.search(line.strip()) for line in loglines)
                for g in groups:
                    if g:
                        print g.string
    
    
    handler = EventHandler('some_log.log')
    notifier = pyinotify.Notifier(wm, handler)
    
    wm.add_watch(handler.file_path, mask)        
    notifier.loop()