Search code examples
phpperformancemicro-optimization

what is faster: in_array or isset?


This question is merely for me as I always like to write optimized code that can run also on cheap slow servers (or servers with A LOT of traffic)

I looked around and I was not able to find an answer. I was wondering what is faster between those two examples keeping in mind that the array's keys in my case are not important (pseudo-code naturally):

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
    if(!in_array($new_val, $a){
        $a[] = $new_val;
        //do other stuff
    }
}
?>

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
    if(!isset($a[$new_val]){
        $a[$new_val] = true;
        //do other stuff
    }
}
?>

As the point of the question is not the array collision, I would like to add that if you are afraid of colliding inserts for $a[$new_value], you can use $a[md5($new_value)]. it can still cause collisions, but would take away from a possible DoS attack when reading from a user provided file (http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html)


Solution

  • The answers so far are spot-on. Using isset in this case is faster because

    • It uses an O(1) hash search on the key whereas in_array must check every value until it finds a match.
    • Being an opcode, it has less overhead than calling the in_array built-in function.

    These can be demonstrated by using an array with values (10,000 in the test below), forcing in_array to do more searching.

    isset:    0.009623
    in_array: 1.738441
    

    This builds on Jason's benchmark by filling in some random values and occasionally finding a value that exists in the array. All random, so beware that times will fluctuate.

    $a = array();
    for ($i = 0; $i < 10000; ++$i) {
        $v = rand(1, 1000000);
        $a[$v] = $v;
    }
    echo "Size: ", count($a), PHP_EOL;
    
    $start = microtime( true );
    
    for ($i = 0; $i < 10000; ++$i) {
        isset($a[rand(1, 1000000)]);
    }
    
    $total_time = microtime( true ) - $start;
    echo "Total time: ", number_format($total_time, 6), PHP_EOL;
    
    $start = microtime( true );
    
    for ($i = 0; $i < 10000; ++$i) {
        in_array(rand(1, 1000000), $a);
    }
    
    $total_time = microtime( true ) - $start;
    echo "Total time: ", number_format($total_time, 6), PHP_EOL;