Search code examples
perl

How to pick last column from data and print sum of it?


trying with:

my @header = split(/\,/,$recIn,-1);
print "Header Feed Date: $header[2]\n";
chop($header[2]);

Solution

  • use warnings;
    use strict;
    use feature 'say';
    
    my $tot = 0;
    
    my $header = <>; 
    
    while (<>) { 
        $tot += (split /,/)[-1];    
    }
    
    say join ',', 'FINAL', $.-1, $tot;
    

    The <> operator reads lines of files submitted at command-line at invocation. The $. variable is the current line number of the last-used filehandle.

    Formatting of numbers (and the number of their decimal places) is nicely down with printf.


    The above processes data from a file. However, the last line, with FINAL..., that is shown in the question is indeed the last line of the input file, which need be checked. Then that also shouldn't be processed so we need to add a check

    while (<>) { 
        if (/^\s*FINAL/) { 
            my @fields = split /,/;
            if ($.-1 != $fields[1] or $tot != $fields[-1]) {
                warn "Wrong FINAL line: $_";
            }
            last;
        }
    
        $tot += (split /,/)[-1];    
    }
    

    But then we process that line multiple (three) times. It is nicer, and also certainly more amenable to checking input data, to first split each line into an array and use the ready array

    while (<>) {
        my @fields = split /,/;
    
       # Is this the last line, with a summary to be checked?
        if ($fields[0] eq 'FINAL') {
            if ($.-1 != $fields[1] or $tot != $fields[-1]) {
                warn "Wrong FINAL line: $_";
            }
            last;
        }
    
        # Validate input if/as needed
    
        $tot += $fields[-1];
    }
    

    This added array construction reflects visibly (and negatively) on the efficiency only if the file is rather large, or many such files are processed. But if input data indeed need be checked for format, validity, or pre-processed in any way, then of course there is no choice. (In very short files this may be more efficient but one would be hard pressed to detect that since it's about only one line.)

    Another option, if the file isn't too large, is to read all lines into an array first, peel off the unneeded first line (header) and the last summary line, and process the rest

    my @lines = <>;
    chomp @lines;
    
    my $header = shift @lines;
    
    my $final = pop @lines;
    
    foreach my $line (@lines) { 
        # or split into an array for checking of various fields etc
        $tot += (split /,/, $line)[-1];
    }
    say join ',', 'FINAL', scalar @lines, $tot;
    
    CHECK_FINAL_LINE: { 
        my @fields = split /,/, $final;
        if ( $fields[1] != @lines or $fields[-1] != $tot ) { 
            warn "FINAL summary line wrong: $final";
        }
    };
    

    Now we avoid checking each and every line for whether it is the last.

    The number of data lines is the length of the array @lines, produced when @lines array is used in scalar context, like in the if in the last block. I put the check in a block so to avoid introducing a @fields array for the rest of the program (this way it's scoped to that block only), and I name the block CHECK_FINAL_LINE for convenience/clarity. The block isn't necessary.

    In the line where the calculated final is printed though we have a list context, imposed by print (and so by say), and we actually need an explicit scalar.