Search code examples
perlbashhttp-redirectbuffering

how do I disable stdout redirection-to-file buffering in perl?


Here's a script that launchs 10 processes, each writing 100,000 lines to its STDOUT, which is inherited from the parent:

#!/usr/bin/env perl
# buffer.pl
use 5.10.0;
use strict;
use warnings FATAL => "all";
use autodie;

use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(4);

$|=1; # don't think this does anything with syswrite...

# start 10 jobs which write 100,000 lines each
for (1 .. 10 ) {
    $pm->start and next;

    for my $j (1 .. 100_000) {
        syswrite(\*STDOUT,"$j\n");
    }

    $pm->finish;
}
$pm->wait_all_children;

If I pipe to another process, all is well..

$ perl buffering.pl | wc -l
1000000

But if I pipe to disk, the syswrites clobber each other.

$ perl buffering.pl > tmp.txt ; wc -l tmp.txt
457584 tmp.txt

What's more, if I open write-file handles in the child processes and write directly to tmp.txt:

#!/usr/bin/env perl
# buffering2.pl
use 5.10.0;
use strict;
use warnings FATAL => "all";
use autodie;

use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(4);

$|=1;

for (1 .. 10) {
    $pm->start and next;
    open my $fh, '>', 'tmp.txt';

    for my $j (1 .. 100_000) {
        syswrite($fh,"$j\n");
    }
    close $fh;

    $pm->finish;
}
$pm->wait_all_children;

tmp.txt has 1,000,000 lines as expected.

$ perl buffering2.pl; wc -l tmp.txt
100000 tmp.txt

So redirection via '>' to disk has some sort of buffering but redirection to a process doesn't? What's the deal?


Solution

  • When you redirect the whole perl script you get one file descriptor (created by the shell when you do > tmp.txt and inherited as stdout by perl) which is dup'd to each child. When you explicitly open in each child you get different file descriptors (not dups of the original). You should be able to replicate the shell redirection case if you hoist open my $fh, '>', 'tmp.txt' out of your loop.

    The pipe case works because you're talking to a pipe and not a file and it has no notion of offset which can be inadvertently shared in the kernel as I described above.