Search code examples
perlstdin

How is perl -i *really* implemented?


In the description of the Perl -i[extension] feature at http://perldoc.perl.org/perlrun.html, code that is materially identical to the following program is given as "an equivalent" to using perl -pi.orig ...:

#!/usr/bin/perl

use strict;
use warnings;

my $extension = '.orig';
my $oldargv = '';
my $backup;
LINE: while (<>) {
    if ($ARGV ne $oldargv) {
        if ($extension !~ /\*/) {
            $backup = $ARGV . $extension;
        } else {
            ($backup = $extension) =~ s/\*/$ARGV/g;
        }
        rename($ARGV, $backup);
        open(ARGVOUT, ">$ARGV");
        select(ARGVOUT);
        $oldargv = $ARGV;
    }
    # Don't change anything; just copy.
}
continue {
    print;
}
select(STDOUT);

This works fine when $extension eq '.orig'; however, Perl defines -i with no extension as well (that is, for $extension eq ''). Perl's defined behavior is to edit the file in place, with no backup file created:

If no extension is supplied, and your system supports it, the original file is kept open without a name while the output is redirected to a new file with the original filename. When perl exits, cleanly or not, the original file is unlinked.

Perhaps my system (Mac OS X Yosemite 10.10.3) doesn't support it.

If I set $extension = '' in this program, then the code will work fine for files smaller than one block of STDIN (4096 bytes in AcivePerl 5.10, but 8192 bytes by ActivePerl 5.16), but it will not work for files larger than one block.

It looks to me that, on my system, if $ARGV and $backup have the same value (which they will if $extension eq '', then the open(ARGVOUT, ">$ARGV") call on line 17 clobbers the input file after one block of it has been read.

I can work around this, of course, by writing to a temporary file and then renaming it at the end. But I'm a bit disappointed, after a couple hours of debugging, that the example in perlrun isn't as general-purpose as I had expected.

  1. Is there a standard, idiomatic way to deal with the $extension eq '' case?

  2. Is this $extension eq '' use-case important enough that perlrun should be edited? Of course, the "and your system supports it" clause means that the example is not incorrect, but the example would be more useful if it covered this case, too.


Solution

  • Perl 5.28 changed -i. This answer pertains to earlier versions of Perl.


    When an extension is provided:

    open(my $fh_in,  '<', $qfn);
    rename($qfn, "$qfn$ext");
    open(my $fh_out, '>', $qfn);
    

    This can be seen using strace.

    $ strace perl -i~ -pe1 a
    ...
    open("a", O_RDONLY)                     = 3
    rename("a", "a~")                       = 0
    open("a", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
    ...
    

    When no extension is provided:

    open(my $fh_in,  '<', $qfn);
    unlink($qfn);
    open(my $fh_out, '>', $qfn);
    

    This can be seen using strace.

    $ strace perl -i -pe1 a
    ...
    open("a", O_RDONLY)                     = 3
    unlink("a")                             = 0
    open("a", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
    ...
    

    Unix systems such as Macs support anonymous files. Windows does not, so -i requires an extension there.

    >perl -i.bak -pe1 a
    
    >perl -i -pe1 a
    Can't do inplace edit without backup.
    

    If we integrate this knowledge in the code you posted, we get the following:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $extension = '.orig';
    my $oldargv = '';
    my $backup;
    LINE: while (<>) {
        if ($ARGV ne $oldargv) {
            if (length($extension)) {
                if ($extension !~ /\*/) {
                    $backup = $ARGV . $extension;
                } else {
                    ($backup = $extension) =~ s/\*/$ARGV/g;
                }
                rename($ARGV, $backup);
            } else {
                die("Can't do inplace edit without backup.\n") if $^O eq 'MSWin32';
                unlink($ARGV);
            }
            open(ARGVOUT, ">$ARGV");
            select(ARGVOUT);
            $oldargv = $ARGV;
        }
        # Don't change anything; just copy.
    }
    continue {
        print;
    }
    select(STDOUT);