Search code examples
phparraysforeachpass-by-reference

When is foreach with a parameter by reference dangerous?


I knew, that it can be dangerous to pass the items by reference in foreach.

In particular, one must not reuse the variable that was passed by reference, because it affects the $array, like in this example:

$array = ['test'];
foreach ($array as &$item){
    $item = $item;
}
$item = 'modified';
var_dump($array);

array(1) { [0]=> &string(8) "modified" }

Now this here bite me: the content of the array gets modified inside the function should_not_modify, even though I don't pass the $array by value.

function should_not_modify($array){
    foreach($array as &$item){
        $item = 'modified';
    }
}
$array = ['test'];
foreach ($array as &$item){
    $item = (string)$item;
}
should_not_modify($array);
var_dump($array);

array(1) { [0]=> &string(8) "modified" }

I'm tempted to go through my whole codebase and insert unset($item); after each foreach($array => &$item).

But, since this is a big task and introduces a potentially useless line, I would like to know if there is a simple rule to know when foreach($array => &$item) is safe without a unset($item); after it, and when not.

Edit for clarification

I think I understand what happens and why. I also know what is best to do against: foreach($array as &$item){...};unset($item);

I know that this is dangerous after foreach($array as &$item):

  • reuse the variable $item
  • pass the array to a function

My question is: Are there other cases that are dangerous, and can we build an exhaustive list of what is dangerous. Or the other way round: is it possible to describe when it is not dangerous.


Solution

  • About foreach

    First of all, some (maybe obvious) clarifications about two behaviors of PHP:

    1. foreach($array as $item) will leave the variable $item untouched after the loop. If the variable is a reference, as in foreach($array as &$item), it will "point" to the last element of the array even after the loop.

    2. When a variable is a reference then the assignation, e.g. $item = 'foo'; will change whatever the reference is pointing to, not the variable ($item) itself. This is also true for a subsequent foreach($array2 as $item) which will treat $item as a reference if it has been created as such and therefore will modify whatever the reference is pointing to (the last element of the array used in the previous foreach in this case).

    Obviously this is very error prone and that is why you should always unset the reference used in a foreach to ensure following writes do not modify the last element (as in example #10 of the doc for the type array).

    About the function that modifies the array

    It's worth noting that - as pointed out in a comment by @iainn - the behavior in your example has nothing to do with foreach. The mere existence of a reference to an element of the array will allow this element to be modified. Example:

    function should_not_modify($array){
        $array[0] = 'modified';
        $array[1] = 'modified2';
    }
    $array = ['test', 'test2'];
    $item = & $array[0];
    
    should_not_modify($array);
    var_dump($array);
    

    Will output:

    array(2) {
      [0] =>
      string(8) "modified"
      [1] =>
      string(5) "test2"
    }
    

    This is admittedly very suprising but explained in the PHP documentation "What References Do"

    Note, however, that references inside arrays are potentially dangerous. Doing a normal (not by reference) assignment with a reference on the right side does not turn the left side into a reference, but references inside arrays are preserved in these normal assignments. This also applies to function calls where the array is passed by value. [...] In other words, the reference behavior of arrays is defined in an element-by-element basis; the reference behavior of individual elements is dissociated from the reference status of the array container.

    With the following example (copy/pasted):

    /* Assignment of array variables */
    $arr = array(1);
    $a =& $arr[0]; //$a and $arr[0] are in the same reference set
    $arr2 = $arr; //not an assignment-by-reference!
    $arr2[0]++;
    /* $a == 2, $arr == array(2) */
    /* The contents of $arr are changed even though it's not a reference! */
    

    It's important to understand that when creating a reference, for example $a = &$b then both $a and $b are equal. $a is not pointing to $b or vice versa. $a and $b are pointing to the same place.

    So when you do $item = & $array[0]; you actually make $array[0] pointing to the same place as $item. Since $item is a global variable, and references inside array are preserved, then modifying $array[0] from anywhere (even from within the function) modifies it globally.

    Conclusion

    Are there other cases that are dangerous, and can we build an exhaustive list of what is dangerous. Or the other way round: is it possible to describe when it is not dangerous.

    I'm going to repeat the quote from the PHP doc again: "references inside arrays are potentially dangerous".

    So no, it's not possible to describe when it is not dangerous, because it is never not dangerous. It's too easy to forget that $item has been created as a reference (or that a global reference as been created and not destroyed), and reuse it elsewhere in your code and corrupt the array. This has long been a topic of debate (in this bug for example), and people call it either a bug or a feature...