Search code examples
perlfileparsingredo

Cleanest Perl parser for Makefile-like continuation lines


A perl script I'm writing needs to parse a file that has continuation lines like a Makefile. i.e. lines that begin with whitespace are part of the previous line.

I wrote the code below but don't feel like it is very clean or perl-ish (heck, it doesn't even use "redo"!)

There are many edge cases: EOF at odd places, single-line files, files that start or end with a blank line (or non-blank line, or continuation line), empty files. All my test cases (and code) are here: http://whatexit.org/tal/flatten.tar

Can you write cleaner, perl-ish, code that passes all my tests?

#!/usr/bin/perl -w

use strict;

sub process_file_with_continuations {
    my $processref = shift @_;
    my $nextline;
    my $line = <ARGV>;

    $line = '' unless defined $line;
    chomp $line;

    while (defined($nextline = <ARGV>)) {
        chomp $nextline;
        next if $nextline =~ /^\s*#/;  # skip comments
        $nextline =~ s/\s+$//g;  # remove trailing whitespace
        if (eof()) {  # Handle EOF
            $nextline =~ s/^\s+/ /;
            if ($nextline =~ /^\s+/) {  # indented line
                &$processref($line . $nextline);
            }
            else {
                &$processref($line);
                &$processref($nextline) if $nextline ne '';
            }
            $line = '';
        }
        elsif ($nextline eq '') {  # blank line
            &$processref($line);
            $line = '';
        }
        elsif ($nextline =~ /^\s+/) {  # indented line
            $nextline =~ s/^\s+/ /;
            $line .= $nextline;
        }
        else {  # non-indented line
            &$processref($line) unless $line eq '';
            $line = $nextline;
        }
    }
    &$processref($line) unless $line eq '';
}

sub process_one_line {
    my $line = shift @_;
    print "$line\n";
}

process_file_with_continuations \&process_one_line;

Solution

  • How about slurping the whole file into memory and processing it using regular expressions. Much more 'perlish'. This passes your tests and is much smaller and neater:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    $/ = undef;             # we want no input record separator.
    my $file = <>;          # slurp whole file
    
    $file =~ s/^\n//;       # Remove newline at start of file
    $file =~ s/\s+\n/\n/g;  # Remove trailing whitespace.
    $file =~ s/\n\s*#[^\n]+//g;     # Remove comments.
    $file =~ s/\n\s+/ /g;   # Merge continuations
    
    # Done
    print $file;