Search code examples
phpregexpreg-replacestrpos

Is using strpos() before preg_replace() faster?


Say we use this preg_replace on millions of post strings:

function makeClickableLinks($s) {
    return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $s);
}

Asume that only 10% of all the posts contain links, would it be faster to check strpos($string, 'http') !== false before calling preg_replace()? If so, why? Doesn't preg_replace() perform some pretests internally?


Solution

  • Surprisingly, yes!

    Here are benchmarks for you to analyze on 10,000,000 strings with both functions:

    Test 1 - String that matches the pattern:

    "Here is a great new site to visit at http://example.com so go there now!"
    

    preg_replace alone took 10.9626309872 seconds
    strpos before preg_replace took 12.6124269962 seconds ← slower

    Test 2 - String that doesn't match the pattern:

    "Here is a great new site to visit at ftp://example.com so go there now!"
    

    preg_replace alone took 6.51636195183 seconds
    strpos before preg_replace took 2.91205692291 seconds ← faster

    Test 3 - 10% of the strings match the pattern:

    "Here is a great new site to visit at ftp://example.com so go there now!" (90%)
    "Here is a great new site to visit at http://example.com so go there now!" (10%)
    

    preg_replace alone took 7.43295097351 seconds
    strpos before preg_replace took 4.31978201866 seconds ← faster

    It's just a simple benchmark on two strings, but there is a clear difference in speed.


    Here is the test harness for the "10%" case:

    <?php
    $string1 = "Here is a great new site to visit at http://example.com so go there now!";
    $string2 = "Here is a great new site to visit at ftp://example.com so go there now!";
    
    function makeClickableLinks1($s) {
        return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $s);
    }
    
    function makeClickableLinks2($s) {
        return strpos($s, 'http') !== false ? preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $s) : null;
    }
    
    /* Begin test harness */
    
    $loops = 10000000;
    
    function microtime_float() {
        list($usec, $sec) = explode(" ", microtime());
        return ((float)$usec + (float)$sec);
    }
    
    /* Test using only preg_replace */
    
    $time_start = microtime_float();
    for($i = 0; $i < $loops; $i++) {
        // Only 10% of strings will have "http"
        makeClickableLinks1($i % 10 ? $string2 : $string1);
    }
    $time_end = microtime_float();
    $time = $time_end - $time_start;
    echo "preg_replace alone took $time seconds<br/>";
    
    /* Test using strpos before preg_replace */
    
    $time_start = microtime_float();
    for($i = 0; $i < $loops; $i++) {
        // Only 10% of strings will have "http"
        makeClickableLinks2($i % 10 ? $string2 : $string1);
    }
    $time_end = microtime_float();
    $time = $time_end - $time_start;
    echo "strpos before preg_replace took $time seconds<br/>";
    ?>