Search code examples
sortingperlreference

Perl sort array creates new element if element being referenced


I have an array of strings, would like to record its original order in another array by referencing its elements, then I sort the array of strings, it creates new element in this array rather than the original element.

my @item = qw/zz xx/;
my @order;

foreach (@item) {
    push @order, \$_;
}

print $item[0] . " " . \$item[0] . "\n";
print $item[1] . " " . \$item[1] . "\n";

@item = sort {$a cmp $b} @item;

print $item[0] . " " . \$item[0] . "\n";
print $item[1] . " " . \$item[1] . "\n";

Result:

zz SCALAR(0x5618ffd27220)
xx SCALAR(0x5618ffd273e8)
xx SCALAR(0x5618ffd46668)
zz SCALAR(0x5618ffd46698)

If I comment out the reference block, then the outcome return to expected result. So can anyone explain how referencing the element would affect the result of "sort".

Expected Result:

zz SCALAR(0x5618ffd27220)
xx SCALAR(0x5618ffd273e8)
xx SCALAR(0x5618ffd273e8)
zz SCALAR(0x5618ffd27220)

Solution

  • Assigning a scalar to an element of an array makes a copy of the scalar.

    For example,

    use v5.14;
    my $x = 123;
    say \$x;
    my @a = $x;
    say \$a[0];
    

    outputs something like

    SCALAR(0x55dd65bf0e18)
    SCALAR(0x55dd65bc14b8)
    

    So the result you get is the result you should expect.


    But what about

    my @c = qw( l k j );
    say for \( @c );
    @c = sort @c;
    say for \( @c );
    
    SCALAR(0x55c396e0d4b8)
    SCALAR(0x55c396e0d4e8)
    SCALAR(0x55c396e0d680)
    SCALAR(0x55c396e0d680)
    SCALAR(0x55c396e0d4e8)
    SCALAR(0x55c396e0d4b8)
    

    sort has an optimization for the case when an array is being sorted, and the results are assigned to the same array (e.g. @a = sort @a;). In these situations, it performs an in-place sort. It simply moves the C pointers around without creating any new scalars.

    As long as nothing references the values of the array, it is safe to do so.


    But what about

    my @c = qw( l k j );
    my @r = \( @c );
    say for \( @c );
    @c = sort @c;
    say for \( @c );
    
    SCALAR(0x55abf592e4b8)
    SCALAR(0x55abf592e4e8)
    SCALAR(0x55abf592e680)
    SCALAR(0x55abf595d660)
    SCALAR(0x55abf595d648)
    SCALAR(0x55abf595d6f0)
    

    An optimization is only correct if it's transparent. An optimization that causes code to behave differently than unoptimized code is buggy. In other words, sort should work the same with or without the optimization. Which means that sort needs to make copies of values referenced by something other the array even when the optimization is used.

    It can get away from doing this if nothing else references the array values, but that's not the case here.


    Note that versions of Perl before 5.26 were buggy. These programs should output the same, but they don't:

    $ perl5.24t -le'@c = qw(l k j); $d = \$c[2]; @c = @e = sort @c; $$d = "X"; print @c'
    jkl
    
    $ perl5.24t -le'@c = qw(l k j); $d = \$c[2]; @c =      sort @c; $$d = "X"; print @c'
    Xkl
    

    This was fixed in 5.26. Since then, the sort is still done in-place, but a copy of every value with a refcount greater than one is made first.

    $ perl5.26t -le'@c = qw(l k j); $d = \$c[2]; @c = @e = sort @c; $$d = "X"; print @c'
    jkl
    
    $ perl5.26t -le'@c = qw(l k j); $d = \$c[2]; @c =      sort @c; $$d = "X"; print @c'
    jkl