Search code examples
phppreg-matchpreg-splitmbstring

php first word only from mb string


I used preg_match but its returning pdf as it English that's why may be.

But I want to get only 練馬春日町Ⅳ

Is there any way to detect it for mb string.

<?php 
// Initialize a sentence to a variable 
$sentence = '練馬春日町Ⅳ 清掃レポート.pdf'; 

// Use preg_match() function to get the 
// first word of a string 
preg_match('/\b\w+\b/i', $sentence, $result);  

// Display result 
echo "The first word of string is: ".$result[0]; 

?>

FIDDLE


Solution

  • To make your code work you just need to add the u flag to the regex so that it matches unicode characters:

    preg_match('/^\w+/iu', $sentence, $result);  
    echo "\nThe first word of string is: ".$result[0];
    

    Output:

    The first word of string is: 練馬春日町Ⅳ
    

    Note that since you want the first word you can simply anchor your regex with ^ and the second \b is not required as \w+ will match as many word characters as it can i.e. until it gets to the first word break.

    Alternatively you can use mb_split with a regex of \p{Z} which matches any unicode whitespace or invisible separator:

    $sentence = '練馬春日町Ⅳ 清掃レポート.pdf'; 
    $first_word = mb_split('\p{Z}', $sentence);
    echo $first_word[0];
    

    Output:

    練馬春日町Ⅳ
    

    Demo on 3v4l.org