Search code examples
phpstringcharsequence

What is the fastest way to check amount of specific chars in a string in PHP?


So i need to check if amount of chars from specific set in a string is higher than some number, what a fastest way to do that?

For example i have a long string "some text & some text & some text + a lot more + a lot more ... etc." and i need to check if there r more than 3 of next symbols: [&,.,+]. So when i encounter 4th occurrence of one of these chars i just need to return false, and stop the loop. So i think to create a simple function like that. But i wonder is there any native method in php to do such a thing? But i need some function which will not waste time parsing the string till the end, cuz the string may be pretty long. So i think regexp and functions like count_chars r not suited for that kind of job...

Any suggestions?


Solution

  • Well, all my thoughts were wrong and my expectations were crushed by real tests. RegExp seems to work from 2 to 7 times faster (with different strings) than self-made function with simple symbol-checking loop.

    The code:

    // self-made function:
    function chk_occurs($str,$chrs,$limit){
        $r=false;
        $count = 0;
        $length = strlen($str);
        for($i=0; $i<$length; $i++){
            if(in_array($str[$i], $chrs)){
                $count++;
                if($count>$limit){
                    $r=true;
                    break;
                }
            }
        }
        return $r;
    }
    
    // RegExp i've used for tests:
    preg_match('/([&\\.\\+]|[&\\.\\+][^&\\.\\+]+?){3,}?/',$str);
    

    Of course it works faster because it's a single call to native function, but even same code wrapped into function works from 2 to ~4.8 times faster.

    //RegExp wrapped into the function:
    function chk_occurs_preg($str,$chrs,$limit){
        $chrs=preg_quote($chrs);
        return preg_match('/(['.$chrs.']|['.$chrs.'][^'.$chrs.']+?){'.$limit.',}?/',$str);
    }
    

    P.S. i wasn't bothered to check cpu-time, just was testing walltime measured via microtime(true); of the 200k iteration loop, but it's enough for me.