Search code examples
perl

Why a string assignment operation result in double memory usage in perl


When I executed the following perl code on the command line with a -e option,

$s=q(top -bn1 -p $PPID); system $s; $v="\0"x(2*1024**3); print"\n"x3; system $s

Its output may look like this,

$ perl -e '$s=q(top -bn1 -p $PPID); system $s; $v="\0"x(2*1024**3); print"\n"x3;system $s'

top - 20:11:27 up 5 days,  4:10,  1 user,  load average: 0.24, 0.10, 0.08
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.3 us,  0.0 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7937.6 total,   3065.8 free,   4144.2 used,    727.6 buff/cache
MiB Swap:   2048.0 total,     87.8 free,   1960.2 used.   3433.7 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
4133828 sam       20   0 2110500   2.0g   4712 S   0.0  25.9   0:02.35 perl



top - 20:11:31 up 5 days,  4:10,  1 user,  load average: 0.30, 0.11, 0.09
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7937.6 total,   1029.3 free,   6180.7 used,    727.6 buff/cache
MiB Swap:   2048.0 total,     87.8 free,   1960.2 used.   1397.2 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
4133828 sam       20   0 4207656   4.0g   4712 S   0.0  51.7   0:06.17 perl

I found it first used about 2.0g in RES before assignment and then 4.0g.

If executing a simple top command in perl,

$ perl -e 'system "top -bn1 -p $PPID"'
top: -p requires argument

$ perl -e 'system "top -bn1 -p \$PPID"'
top - 20:27:18 up 5 days,  4:26,  1 user,  load average: 0.13, 0.17, 0.13
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.6 us,  0.0 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7937.6 total,   4600.3 free,   2081.7 used,   1255.6 buff/cache
MiB Swap:   2048.0 total,     89.0 free,   1959.0 used.   5496.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
4142493 sam       20   0   13344   3664   3384 S   0.0   0.0   0:00.00 perl

it used 3664 kB (perl -E 'say 3664/1024' => 3.578125) in RES.

Other

$ perl --version

This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux-gnu-thread-multi
(with 60 registered patches, see perl -V for more detail)

Copyright 1987-2019, Larry Wall
...

Solution

  • Due to constant folding, "\0"x(2*1024**3) creates a 2 GiB string at compile-time.

    $ perl -MO=Concise,-exec -e'$v = "\0" x 6;'
    1  <0> enter v
    2  <;> nextstate(main 1 -e:1) v:{
    3  <$> const[PV "\000\000\000\000\000\000"] s/FOLD
    4  <#> gvsv[*v] s
    5  <2> sassign vKS/2
    6  <@> leave[1 ref] vKP/REFC
    -e syntax OK
    

    (I'm using shorter strings in my examples, but I confirmed the same behaviour happens for "\0"x(2*1024**2).)

    Assigning a string normally uses no memory thanks to the Copy-on-Write ("COW") mechanism.

    $ perl -MDevel::Peek -e'$v = "abcdef"; Dump($v);'
    SV = PV(0x55cb5eec3f20) at 0x55cb5eef32e0
      REFCNT = 1
      FLAGS = (POK,IsCOW,pPOK)              <--- Copy avoided thanks to COW.
      PV = 0x55cb5ef3aaf0 "abcdef"\0
      CUR = 6
      LEN = 16
      COW_REFCNT = 1
    

    However, the COW mechanism doesn't appear to be used for folded constants.

    $ perl -MDevel::Peek -e'$v = "abc"."def"; Dump($v);'
    SV = PV(0x5632e85d3f20) at 0x5632e86030d0
      REFCNT = 1
      FLAGS = (POK,pPOK)                    <--- String buffer isn't shared.
      PV = 0x5632e85ef540 "abcdef"\0
      CUR = 6
      LEN = 16
    
    $ perl -MDevel::Peek -e'$v = "\0" x 6; Dump($v);'
    SV = PV(0x555a533d3f20) at 0x555a534031e0
      REFCNT = 1
      FLAGS = (POK,pPOK)                    <--- String buffer isn't shared.
      PV = 0x555a533ef540 "\x00\x00\x00\x00\x00\x00"\0
      CUR = 6
      LEN = 16
    

    Therefore, $v = "\0"x(2*1024**3); creates a copy of the string.

    This is a bug that has now been fixed in upcoming 5.42.