I've got question about program architecture. Say you've got 100 different log files with different formats and you need to parse and put that info into an SQL database. My view of it is like:
use general config file like:
program1->name1("apache",/var/log/apache.log) (modulename,path to logfile1)
program2->name2("exim",/var/log/exim.log) (modulename,path to logfile2)
....
sqldb->configuration
use something like a module (1 file per program) type1.module (regexp, logstructure(somevariables), sql(tables and functions))
fork or thread processes (don't know what is better on Linux now) for different programs.
So question is, is my view of this correct? I should use one module per program (web/MTA/iptablat) or there is some better way? I think some regexps would be the same, like date/time/ip/url. What to do with that? Or what have I missed?
example: mta exim4 mainlog
2011-04-28 13:16:24 1QFOGm-0005nQ-Ig <= exim@mydomain.org.ua** H=localhost (exim.mydomain.org.ua) [127.0.0.1]:51127 I=[127.0.0.1]:465 P=esmtpsa X=TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32 CV=no A=plain_server:spam S=763 id=1303985784.4db93e788cb5c@mydomain.org.ua T="test" from <exim@exim.mydomain.org.ua> for test@domain.ua
everything that is bold is already parsed and will be putted into sqldb.incoming table. now im having structure in perl to hold every parsed variable like $exim->{timstamp} or $exim->{host}->{ip}
my program will do something like tail -f /file
and parse it line by line
Flexability: let say i want to add supprot to apache server (just timestamp userip and file downloaded). all i need to know what logfile to parse, what regexp shoud be and what sql structure should be. So im planning to have this like a module. just fork or thread main process with parameters(logfile,filetype). Maybe further i would add some options what not to parse (maybe some log level is low and you just dont see mutch there)
I would do it like this:
If you want an example of steps 1 and 2, we have one on our project. See MT::FileMgr and MT::FileMgr::* here.