Search code examples
perlperl-moduleseparation-of-concerns

should I read from ARGV in own perl module


My Perl extracts and processes data from (multiple) log file(s), currently processing all files in @ARGV.

The most important part of this script is the log decoding itself, it incorporates a lot of knowledge about the log file format. This transforming part from log (actually into an array of hashs) has proven to be subject of change (as the log format evolves), and to be the basis for further processing steps: there are often specific questions to answer from the decoded records which is done best right in Perl, that's why I'm thinking of making it a module.

The core function is using nested (or name it scoped) pattern matching sitting in a while (<>) loop:

while (<ARGV>) {
    $totalLines ++;
    if (m/^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d) L(\d) (.+)/) {
        my $time = $1;
        my $line = $2;
        my $event = $3;
        if ($event =~ m/^connect: (.+)$/) {
            $pendings{$line}{station} = $1;
            ...

...more than 200 lines follow before the closing braces.

I have the feeling that simply reading from ARGV would exceed the Do one thing and do it well rule. When I searched the web, I found nothing that speaks explicitly for or against reading from ARGV in module, but maybe my search patterns were just poor. [1] [2]

(How) should I re-frame my decoding for placing it into a module?
...or should I change my feelings about this?


[1] perltrap - perldoc.perl.org
[2] perlmodstyle - perldoc.perl.org


Solution

  • You can make your function unaware of <ARGV> iterator logic,

    sub foo {
        my ($iter) = @_;
    
        # `defined()` should be used explicitly unlike `while (<ARGV>)`
        while (defined (my $line = $iter->())) {
            # if ..
        }
    }
    
    foo(sub{ scalar <ARGV> }); # force scalar context; one line/record per call