Search code examples
regexperlperl-pod

Why won't this regex remove final whitespace from Pod::Usage text?


I am working on a module that relies on Pod::Usage to parse the calling script's POD and then send usage, help, and man text to a scalar variable. I needed to remove the final whitespace from that text, so I used a simple regex that I thought would work. And it did ... but intermittently.

Here's a demonstration of the problem. Any insights would be appreciated.

The unexpected behavior (i.e., the failure of the regex to remove final newlines) occurs consistently on my Solaris machine with Perl 5.10.1. Under Windows with Perl 5.12.1, the behavior is erratic (output supplied below).

use strict;
use warnings;

use Pod::Usage qw(pod2usage);
use Test::More;

# Baseline test to show that the regex works.
my $exp                      = "foo\nbar\n...";
my $with_trailing_whitespace = $exp . "   \n\n";
$with_trailing_whitespace    =~ s!\s+\Z!!;
my $ords = get_ords_of_final_chars($with_trailing_whitespace);
is_deeply $ords, [46, 46, 46]; # String ends with 3 periods (not whitespace).

# Run a similar test, using text from Pod::Usage.
for (1 .. 2){
    my $pod = get_pod_text();
    $ords = get_ords_of_final_chars($pod);
    is_deeply $ords, [46, 46, 46];
}

done_testing();

sub get_ords_of_final_chars {
    # Takes a string. Return array ref of the ord() of last 3 characters.
    my $s = shift;
    return [ map ord(substr $s, - $_, 1), 1 .. 3 ];
}

sub get_pod_text {
    # Call pod2usage(), sending output to a scalar.
    open(my $fh, '>', \my $txt) or die $!;
    pod2usage(-verbose => 2, -exitval => 'NOEXIT', -output  => $fh);
    close $fh;   # This doesn't help.

    # Here's the same regex as above.
    # 
    # If I use chomp(), the newlines are consistently removed:
    #     1 while chomp($txt);
    $txt =~ s!\s+\Z!!;
    return $txt; 
}

__END__

=head1 NAME

sample - Some script...

=head1 SYNOPSIS

foo.pl ARGS...

=head1 DESCRIPTION

This program will read the given input file(s) and do something
useful with the contents thereof...

=cut

Output on my Windows box:

$ perl  demo.pl
ok 1
not ok 2
#   Failed test at demo.pl line 18.
#     Structures begin differing at:
#          $got->[0] = '10'
#     $expected->[0] = '46'
not ok 3
#   Failed test at demo.pl line 18.
#     Structures begin differing at:
#          $got->[0] = '10'
#     $expected->[0] = '46'
1..3
# Looks like you failed 2 tests of 3.

$ perl  demo.pl
ok 1
ok 2
ok 3
1..3

Solution

  • Well, to quote perlre:

    \Z Match only at end of string, or before newline at the end
    \z Match only at end of string
    

    So, you should be using $txt =~ s!\s+\z!!; (lower case z).

    Although, since \s+ is greedy, I would have expected it to work anyway. Maybe it's a Perl bug.