Search code examples
phparraysphp-internals

Why hashtable (array implementation) compaction does not kick in in PHP?


I am making a big PHP array and then remove most of the beginning of it. As we know it won't free the memory only mark removed items as Undef, as explained by Nikita here. However when I put more items to the array it should reach the compaction point and we should see memory reduction, but that is not observed.

Please help to understand why compaction is not happening?

<?php

$items = [];

// hashtable compaction should start at every 2^n size.
$size = pow(2, 18);

for($i=0;$i<$size;$i++) {
    $items[$i] = "string $i";
}
// at this point we reached the maximum tablesize before compaction
// 2^18 is used to make memory consumpotion visible enough.
printf("mem: %d Mb \n", memory_get_usage(true) / 1024 / 1024);

// Remove all except the last one
for($i=0;$i<$size-1;$i++) {
    unset($items[$i]); 
}

// here it is expected to see the same memory footprint as nothing is freed,
// only marked as UNDEF.
printf("mem: %d Mb \n", memory_get_usage(true) / 1024 / 1024);

// However here we expect to hit the compaction point 
// and see $size-1 elements removed, while 1 added, so memory is expected to go down
$items[] = $i;
printf("mem: %d Mb \n", memory_get_usage(true) / 1024 / 1024);

Live example: https://onlinephp.io/c/98970


Solution

  • Because Nikic said in the article:

    One problem with the current implementation is that arData never shrinks (unless explicitly told to).

    This indicates that the so-called compaction will only reorganize data without reducing memory usage.

    For packed hashtables, the author also said:

    In packed hashtables the arHash array is NULL and lookups will directly index into arData. If you’re looking for the key 5 then the element will be located at arData[5] or it doesn’t exist at all.

    I think that PHP directly uses arData as a regular array, so packed hashtables will not be compacted.

    In the current source code, it can be seen that a packed hashtable has been declared as zval *arPacked;, but it still never shrinks.

    However because the size of Bucket is larger than zval, there is an exception, when a regular hashtable is converted to a packed hashtable, the memory usage will be reduced (about half). Some functions such as sort can achieve this (surprisingly, shuffle can also do it).