Search code examples
perlipcperl-ipc-run

Perl IPC::Run pipeline blocks with input file larger than 64KiB


A Perl program uses IPC::Run to pipe a file through a series of commands determined at runtime and into another file, like this small test excerpt demonstrates:

#!/usr/bin/perl
use IO::File;
use IPC::Run qw(run);

open (my $in, 'test.txt');
my $out = IO::File->new_tmpfile;

my @args = ( [ split / /, shift ], "<", $in); # this code
while ($#ARGV >= 0) {                         # extracted
    push @args, "|", [ split / /, shift ];    # verbatim
}                                             # from the
push @args, ">pipe", $out;                    # program

print "Running...";
run @args or die "command failed ($?)";
print "Done\n";

It builds the pipeline from commands given as arguments, the test file is hard-coded. The problem is that the pipeline hangs if the file is bigger than 64KiB. Here is a demonstration that uses cat in the pipeline to keep things simple. First a 64KiB (65536 bytes) file works as expected:

$ dd if=/dev/urandom of=test.txt bs=1 count=65536
65536 bytes (66 kB, 64 KiB) copied, 0.16437 s, 399 kB/s
$ ./test.pl cat
Running...Done

Next, one byte more. The call to run never returns...

$ dd if=/dev/urandom of=test.txt bs=1 count=65537
65537 bytes (66 kB, 64 KiB) copied, 0.151517 s, 433 kB/s
$ ./test.pl cat
Running...

With IPCRUNDEBUG enabled, plus a few more cats you can see it's the last child that doesn't end:

$ IPCRUNDEBUG=basic ./test.pl cat cat cat cat
Running...
...
IPC::Run 0000 [#1(3543608)]: kid 1 (3543609) exited
IPC::Run 0000 [#1(3543608)]: 3543609 returned 0
IPC::Run 0000 [#1(3543608)]: kid 2 (3543610) exited
IPC::Run 0000 [#1(3543608)]: 3543610 returned 0
IPC::Run 0000 [#1(3543608)]: kid 3 (3543611) exited
IPC::Run 0000 [#1(3543608)]: 3543611 returned 0

(with a file under 64KiB you see all four exit normally)

How can this be made to work for files of any size ?

(Perl 5, version 30, subversion 3 (v5.30.3) built for x86_64-linux-thread-multi, tried on Alpine Linux, the target platform, and Arch Linux to rule out Alpine as a cause)


Solution

  • You have a deadlock:

    Diagram of the deadlock

    Consider using one of the following instead:

    run [ 'cat' ], '<', $in_fh, '>', \my $captured;
    
    # Do something with the captured output in $captured.
    

    or

    my $receiver = sub {
        # Do something with the chunk in $_[0].
    };
    
    run [ 'cat' ], '<', $in_fh, '>', $receiver;
    

    For example, the following "receiver" processes each line as they come in:

    my $buffer = '';
    my $receiver = sub {
        $buffer .= $_[0];
        while ($buffer =~ s/^(.*)\n//) {
           process_line("$1");
        }
    };
    
    run [ 'cat' ], '<', $in_fh, '>', $receiver;
    
    die("Received partial line") if length($buffer);