Search code examples
regexlistperlmerge

Merging non-zeros, non-overlapping elements from two lists in Perl


I'm looking for a clean method in Perl to merge a collection of lists. The all have the same length, and each consists mainly of zeros, but also has short contiguous segments of non-zero entries. For example, here are two representative lists of length 25:

@flags1 = qw( 0  0  0  0 21 22 23  0  0  0  0  0  0  0  0 41 42 43  0  0  0  0  0  0  0);
@flags2 = qw(11 12 13  0  0  0  0  0  0  0  0  0  0 31 32 33  0  0  0  0  0 51 52  53 0);

The objective is to merge the elements of @flags2 into @flags1 for all the places where a contiguous clump of non-zero elements in @flags2 replaces only zero entries in @flags1. If there is an overlap with any of the non-zero elements of @flags1, the associated contiguous clump of non-zero values in @flags2 is discarded instead of being merged.

Thus, for the example above, the contiguous clump of values 31, 32, and 33 in @flags2[13..15] are discarded because one of the entries, $flags2[15] is non-zero and collides with the non-zeros value at $flags1[15]. The resulting desired merged list would be:

@merged = qw(11 12 13  0 21 22 23  0  0  0  0  0  0  0  0 41 42 43  0  0  0 51 52  53  0);

I have experimented with collecting up contiguous elements of non-zero elements into a list of lists, and then comparing them using for and if statements, but it's a mess, and I think it will be difficult for any other developer to understand the logic. If anyone could propose an more elegant solution that would be much appreciated.


Solution

  • use List::Util qw( none );
    
    my $s = 0;
    while (1) {
       # Find start of next clump.
       ++$s while $s < @flags2 && !$flags2[$s];
    
       # Exit if at end of array.
       last if $s == @flags2;
    
       # Find end of clump.
       my $e = $s+1;
       ++$e while $e < @flags2 && $flags2[$e];
    
       # Merge in clump.
       my @clump = $s .. $e-1;
       if ( none { $_ } @flags1[ @clump ] ) {      # Or `!grep { $_ }`
          @flags1[ @clump ] = @flags2[ @clump ];
       }
    
       $s = $e;
    
       # Exit if at end of array.
       last if $s == @flags2;
    }
    

    This is another approach that is akin to the merge portion of a merge sort.

    sub get_next_clump {
       my ( $f, $s ) = @_;
       ++$s while $s < @$f && !$f[$s];
       return if $s == @$f;
       my $e = $s+1;
       ++$e while $e < @$f && $f[$e];
       return $s, $e;
    }
    
    my $ok1 = my ( $f1_s, $f1_e ) = get_next_clump( \@flags1, 0 );
    my $ok2 = my ( $f2_s, $f2_e ) = get_next_clump( \@flags2, 0 );
    
    while ( $ok1 && $ok2 ) {
       if ( $f2_s < $f1_e && $f2_e > $f1_s ) {
          $ok2 = ( $f2_s, $f2_e ) = get_next_clump( \@flags2, $f2_e );
          next;
       }
       
       if ( $f1_s < $f2_s ) {
          $ok1 = ( $f1_s, $f1_e ) = get_next_clump( \@flags1, $f1_e );
       } else {
          @flags1[ $f2_s .. $f2_e-1 ] = @flags2[ $f2_s .. $f2_e-1 ];
          $ok2 = ( $f2_s, $f2_e ) = get_next_clump( \@flags2, $f2_e );
       }
    }
    
    while ( $ok2 ) {
       @flags1[ $f2_s .. $f2_e-1 ] = @flags2[ $f2_s .. $f2_e-1 ];
       $ok2 = ( $f2_s, $f2_e ) = get_next_clump( \@flags2, $f2_e );
    }