Search code examples
perlforkmultiprocess

Need help understanding Perl code- Multi process / fork


I was looking for an example to limit the number of forked processes to run at the same time and I ran across this old code

#!/usr/bin/perl
#total forks, max childs, what to run
#function takes 2 scalars and a reference to code to run
sub mfork ($$&) {
        my ($count, $max, $code) = @_;
        # total number of processes to spawn
        foreach my $c (1 .. $count) {
                #what is happening here? why wait vs waitpid?
                wait unless $c <= $max;
                die "Fork failed: $!\n" unless defined (my $pid = fork);
                # i don't undestand the arrow notation here and how it calls a function, 
                #also unless $pid is saying run function unless you're the parent
                exit $code -> ($c) unless $pid;
        }
        #no idea what's happening here, why are we waiting twice? for the last process?
        #why 1 until (-1 == wait)? what's 1 doing here
        1 until -1 == wait;
}


#code to run
mfork 10, 3, sub {
        print "$$: " . localtime() . ": Starting\n";
        select undef, undef, undef, rand 2;
        print "$$: " . localtime() . ": Exiting\n";
};

Solution

  • Let's take a look at the code. Code is yours, with most of your comment removed. All other comments are mine.

    #!/usr/bin/perl
    # total forks, max childs, what to run
    # function takes 2 scalars and a reference to code to run
    
    sub mfork ($$&) {
            my ($count, $max, $code) = @_;
    
          # total number of processes to spawn
            foreach my $c (1 .. $count) {
    
                    # wait waits for any child to return,
                    # waitpid for a specific one
                    wait unless $c <= $max;
    
                    die "Fork failed: $!\n" unless defined (my $pid = fork);
    
                    # the arrow is used to call the coderef in $code
                    # and the argument is $c. It's confusing because it has
                    # the space. It's a deref arrow, but looks like OOp.
                    # You're right about the 'unless $pid' part.
                    # If there is $pid it's in the parent, so it does
                    # nothing. If it is the child, it will run the
                    # code and exit.
    
                    exit $code -> ($c) unless $pid;
            }
    

            # This is reached after the parent is done with the foreach.
            # It will wait in the first line of the foreach while there are
            # still $count tasks remaining. Once it has spawned all of those
            # (some finish and exit and make room for new ones inside the
            # loop) it gets here, where it waits for the remaining ones.
            # wait will return -1 when there are no more children.
            # The '1 until' is just short for having an until loop that
            # doesn't have the block. The 1; is not a costly operation.
            # When wait == -1 it passes the line, returning from the sub.
            1 until -1 == wait;
    }
    
    
    # because of the prototype above there are no () needed here
    mfork 10, 3, sub {
            print "$$: " . localtime() . ": Starting\n";
            select undef, undef, undef, rand 2;
            print "$$: " . localtime() . ": Exiting\n";
    };
    

    Let's look at stuff in detail.

    • There is wait and waitpid. wait will wait until any of the children returns. That is useful because the program doesn't care which slot gets freed. As soon as one finishes, a new one can be spawned. waitpid takes an argument of a specific $pid. That's not helpful here.
    • The $code->($c) syntax runs a coderef. Just like %{ $foo }{bar} will dereference a hashref, &{ $baz }() will dereference (and run, that's the ()) a coderef. An easier to read way is $foo->{bar}. Just the same is true for $baz->(). The arraow derefs it. See perlref and perlreftut.

    While this is nice and useful, maybe it would make more sense to use Parallel::Forkmanager, which gives the power of this in a lot less lines of code, and you don't need to worry how it works.

    use strict;
    use warnings;
    use Parallel::ForkManager;
    
    my $pm = Parallel::ForkManager->new(3); # max 3 at the same time
    
    DATA_LOOP:
    foreach my $data (1 .. 10) {
      # Forks and returns the pid for the child:
      my $pid = $pm->start and next DATA_LOOP;
    
      ... do some work with $data in the child process ...
      print "$$: " . localtime() . ": Starting\n";
      select undef, undef, undef, rand 2;
      print "$$: " . localtime() . ": Exiting\n";
    
      $pm->finish; # Terminates the child process
    }
    

    That's it. Way clearer to read. :)