Search code examples
phpperformancesubstr

most efficient means of parsing a simple clearly defined string?


I'm only asking because this is looping millions of times.

string is simply like this:

01-20

Its always like that... 2 digits (leading zero) followed by hyphen and another 2 digits (leading zero). I simply need to assign the first (as integer) to one variable and the second (as integer) to another variable.

str_split? substr? explode? regex?


Solution

  • Given a variable $txt, this has the best performance:

        $a = (int)$txt;
        $b = (int)substr($txt, -2); 
    

    You could measure the performance of different alternatives with a script like this:

    <?php
    $txt = "01-02";
    $test_count = 4000000;
    
    // SUBSTR -2
    $time_start = microtime(true);
    for ($x = 0; $x <= $test_count; $x++) {
        $a = (int)$txt; // numeric conversion ignores second part of string.
        $b = (int)substr($txt, -2); 
    }
    $duration = round((microtime(true) - $time_start) * 1000);
    echo "substr(s,-2): {$a} {$b}, process time: {$duration}ms <br />";
    
    // SUBSTR 3, 2
    $time_start = microtime(true);
    for ($x = 0; $x <= $test_count; $x++) {
        $a = (int)$txt; // numeric conversion ignores second part of string.
        $b = (int)substr($txt, 3, 2);   
    }
    $duration = round((microtime(true) - $time_start) * 1000);
    echo "substr(s,3,2): {$a} {$b}, process time: {$duration}ms <br />";
    
    // STR_SPLIT
    $time_start = microtime(true);
    for ($x = 0; $x <= $test_count; $x++) {
        $arr = str_split($txt, 3);
        $a = (int)$arr[0]; // the ending hyphen does not break the numeric conversion
        $b = (int)$arr[1];
    }
    $duration = round((microtime(true) - $time_start) * 1000);
    echo "str_split(s,3): {$a} {$b}, process time: {$duration}ms <br />";
    
    // EXPLODE
    $time_start = microtime(true);
    for ($x = 0; $x <= $test_count; $x++) {
        $arr = explode('-', $txt);
        $a = (int)$arr[0];
        $b = (int)$arr[1];
    }
    $duration = round((microtime(true) - $time_start) * 1000);
    echo "explode('-',s): {$a} {$b}, process time: {$duration}ms <br />";
    
    // PREG_MATCH
    $time_start = microtime(true);
    for ($x = 0; $x <= $test_count; $x++) {
        preg_match('/(..).(..)/', $txt, $arr);
        $a = (int)$arr[1];
        $b = (int)$arr[2];
    }
    $duration = round((microtime(true) - $time_start) * 1000);
    echo "preg_match('/(..).(..)/',s): {$a} {$b}, process time: {$duration}ms <br />";
    ?>
    

    When I ran this on PhpFiddle Lite I got results like this:

    substr(s,-2): 1 2, process time: 851ms
    substr(s,3,2): 1 2, process time: 971ms
    str_split(s,3): 1 2, process time: 1568ms
    explode('-',s): 1 2, process time: 1670ms
    preg_match('/(..).(..)/',s): 1 2, process time: 3328ms 
    

    The performance of substr with either (s, -2) or (s, 3, 2) as arguments perform almost equally well, provided you use only one call. Sometimes the second version came out as the winner. str_split and explode perform rather close, but not as well, and preg_match is the clear looser. The results depend on the server load, so you should try this on your own set-up. But it is certain that regular expressions have a heavy payload. Avoid them when you can do the job with the other string functions. I edited my answer when I realised that you can cast the original string immediately to int, which will ignore the part it cannot parse. This practically means you can get the first part as a number without calling any of the string functions. This was decisive to make substr the absolute winner!