Search code examples
phpregexphp-7camelcasingphp-7.1

What is the best (cheapest) way to CamelCase complex input strings?


I have a large number of real-time incoming phrases which need to be transofrmed to alpha only - CamelCase by word and split point.

That's what I came up so far, but is there any cheaper and faster way to perform that task?

function FoxJourneyLikeACamelsHump(string $string): string {
  $string = preg_replace("/[^[:alpha:][:space:]]/u", ' ', $string);
  $string = ucwords($string);
  $camelCase = preg_replace('/\s+/', '', $string);
  return $camelCase;
}

// $expected = "ThQuCkBrWnFXJumpsVRThLZyDG";
$string = " Th3 qu!ck br0wn f0x jumps 0v3r th3 l@zy d0g. ";
$is = FoxJourneyLikeACamelsHump($string);

Results:

Sentences: 100000000
Total time: 40.844197034836 seconds
average: 0.000000408


Solution

  • Your code is quite efficient. You can still improve with a few tweaks:

    • Provide the delimiter to ucwords so it does not have to look for \t, \n, etc, which will not be in your string any way after the first step. On average this gives 1% improvement;
    • You can perform the last step with a non-regex replace on a space. This gives up to 20% improvement.

    Code:

    function FoxJourneyLikeACamelsHump(string $string): string {
        $string = preg_replace("/[^[:alpha:][:space:]]/u", ' ', $string);
        $string = ucwords($string, ' ');
        $camelCase = str_replace(' ', '', $string);
        return $camelCase;
    }
    

    See the timings for the original and improved version on rextester.com.

    Note: As you used ucwords, your code cannot be used reliably for unicode strings in general. To cover for that you would need to use a function like mb_convert_case:

    $string = mb_convert_case($string,  MB_CASE_TITLE);
    

    ... but this has a performance impact.