Search code examples
phpregextokenize

preg_replace add space before and after of punctuation characters


I have a word filled with some punctuations.

$word = "'Ankara'da!?'";

I want to put spaces before or after punctuation characters. Except apostrophe character which is in the middle of word. At the result there must be only one space between letters or punctuations.

Required result: ' Ankara'da ! ? '

I tried below and Added accent Turkish chars. ( because \w didnt work)

preg_replace('/(?![a-zA-Z0-9ğüışöçİĞÜŞÖÇ])/ig', " ", $word);

Result: 'Ankara 'da ! ? '


Solution

  • If you need to only add single spaces between punctuation symbols and avoid adding them at the start/end of the string, you may use the following solution:

    $word = "'Ankara'da!?'";
    echo trim(preg_replace_callback('~\b\'\b(*SKIP)(*F)|\s*(\p{P}+)\s*~u', function($m) {
        return ' ' . preg_replace('~\X(?=\X)~u', '$0 ', $m[1]) . ' ';
    }, $word)); // => ' Ankara'da ! ? '
    

    See the PHP demo.

    The \b\'\b(*SKIP)(*F) part matches and skips all ' that are enclosed with word chars (letters, digits, underscores, and some more rarely used word chars). The \s*(\p{P}+)\s* part matches 0+ whitespaces, then captures 1+ punctuation symbols (including _!) into Group 1 and then any 0+ whitespaces are matched. Then, single spaces are added after each Unicode character (\X) that is followed with another Unicode character ((?=\X)). The outer leading/trailing spaces are later removed with trim()).

    There is a way to do that with

    $word = "'Ankara'da!?'";
    echo preg_replace('~^\s+|\s+$|(\s){2,}~u', '$1', 
        preg_replace('~(?!\b\'\b)\p{P}~u', ' $0 ', $word)
    );
    

    See another PHP demo

    The '~(?!\b\'\b)\p{P}~u' pattern matches any punctuation that is not ' enclosed with word chars, and this symbol is enclosed with spaces, and then '~^\s+|\s+$|(\s){2,}~u' pattern is used to remove all whitespaces at the start/end of the string and shrinks all whitespaces into 1 in all other locations.