Search code examples
linuxperlparsinglogginguniversal

Perl program structure for parsing


I've got question about program architecture. Say you've got 100 different log files with different formats and you need to parse and put that info into an SQL database. My view of it is like:

  1. use general config file like:

    program1->name1("apache",/var/log/apache.log) (modulename,path to logfile1)
    program2->name2("exim",/var/log/exim.log) (modulename,path to logfile2)
    
    ....
    
    sqldb->configuration
    
  2. use something like a module (1 file per program) type1.module (regexp, logstructure(somevariables), sql(tables and functions))

  3. fork or thread processes (don't know what is better on Linux now) for different programs.

So question is, is my view of this correct? I should use one module per program (web/MTA/iptablat) or there is some better way? I think some regexps would be the same, like date/time/ip/url. What to do with that? Or what have I missed?


example: mta exim4 mainlog

2011-04-28 13:16:24 1QFOGm-0005nQ-Ig <= exim@mydomain.org.ua** H=localhost (exim.mydomain.org.ua) [127.0.0.1]:51127 I=[127.0.0.1]:465 P=esmtpsa X=TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32 CV=no A=plain_server:spam S=763 id=1303985784.4db93e788cb5c@mydomain.org.ua T="test" from <exim@exim.mydomain.org.ua> for test@domain.ua

everything that is bold is already parsed and will be putted into sqldb.incoming table. now im having structure in perl to hold every parsed variable like $exim->{timstamp} or $exim->{host}->{ip}

my program will do something like tail -f /file and parse it line by line

Flexability: let say i want to add supprot to apache server (just timestamp userip and file downloaded). all i need to know what logfile to parse, what regexp shoud be and what sql structure should be. So im planning to have this like a module. just fork or thread main process with parameters(logfile,filetype). Maybe further i would add some options what not to parse (maybe some log level is low and you just dont see mutch there)


Solution

  • I would do it like this:

    1. Create a config file that is formatted like this: appname:logpath:logformatname
    2. Create a collection of Perl class that inherit from a base parser class.
    3. Write a script which loads the config file and then loops over its contents, passing each iteration to its appropriate handler object.

    If you want an example of steps 1 and 2, we have one on our project. See MT::FileMgr and MT::FileMgr::* here.