Search code examples
phpregexstringpunctuation

Cleaning up a string with punctuation mistakes and missing spaces


Hi, I have the following string:


TheLion is walking(proudly) through theJungle,but he misses hisTeddy.1very sad day!It is VegeterianDay,too. How can we help him?Maybe withBambi&a good song! Or with bread & butter;What do you think:agree?

I need this:

The Lion is walking (proudly) through the Jungle, but he misses his Teddy. 1 very sad day! It is Vegeterian Day, too. How can we help him? Maybe with Bambi & a good song! Or with bread & butter; What do you think: agree?


1very and 1Very should be treated the same way.

I've tried this:

<?php
$string="TheLion is walking(proudly) through theJungle,but he misses hisTeddy.1very sad day!It is VegeterianDay,too. How can we help him?Maybe withBambi&a good song! Or with bread & butter;What do you think:agree?";
echo trim(preg_replace_callback('~\b\'\b(*SKIP)(*F)|\s*(\p{P}+)\s*~u', function($m) {
    return preg_replace('~\X(?=\X)~u', '$0 ', $m[1]) . ' ';
}, $string)); 
?>

Result:

TheLion is walking( proudly) through theJungle, but he misses hisTeddy. 1very sad day! It is VegeterianDay, too. How can we help him? Maybe withBambi& a good song! Or with bread& butter; What do you think: agree?


Thanks a lot


Solution

  • Thanks to @Thefourthbird

    <?php
    $str = "TheLion is walking(proudly) through theJungle,but he misses hisTeddy.1very sad day!It is VegeterianDay,too. How can we help him?Maybe withBambi&a good song! Or with bread & butter;What do you think:agree?";
    $re = '/\b(?=[(][A-Za-z])|(?<=[,.!;:?)])\b|(?<=[a-z])(?=[A-Z])|(?<=[a-z]&)|(?=&[a-z])|(?<=[0-9])(?=[a-zA-Z])/m';
    $subst = ' ';
    $result = preg_replace($re, $subst, $str);
    echo $result;
    ?>
    

    result:

    The Lion is walking (proudly) through the Jungle, but he misses his Teddy. 1 very sad day! It is Vegeterian Day, too. How can we help him? Maybe with Bambi & a good song! Or with bread & butter; What do you think: agree?

    Have a nice weekend!