Search code examples
phpregexreplacepreg-replacepreg-match-all

Zero-pad numbers to a minimum length if preceded by a word from a whitelist


I am trying to find the numbers in a string that appear after certain words and place leading zeros in front of the numbers.

Examples:

  • Apt 4, John Ave should be Apt 0004, John Ave
  • Block 52, Lane Drive should be Block 0052 Lane Drive

Note: I only want to add leading 0's to make it a 4 digit number

My code partially works, however it is replacing all numbers that it finds with leading zeros. I think preg_replace() should be able to achieve this with better results.

$s = '23 St John Apt 92 rer 4, Wellington Country Block 5 No value  test 54545 tt 232';
preg_match_all('/Apartment\s[0-9]+|Apt\s[0-9]+|Block\s[0-9]+|Department\s[0-9]+|Lot\s[0-9]+|Number\s[0-9]+|Villa\s[0-9]+/i', $s, $matches);

var_dump($matches);

foreach($matches[0] as $word)
{
    preg_match_all('!\d+!', $word, $matches2);

    foreach ($matches2[0] as $value)
    {
        $value = trim($value);
    
        if (strlen($value) == 1)
        {
            $s = str_replace($value, "000" . $value, $s);
        }
        elseif (strlen($value) == 2)
        {
            $s = str_replace($value, "00" . $value, $s);
        }
        elseif (strlen($value) == 3)
        {
            $s = str_replace($value, "0" . $value, $s);
        }
        else
        {
            //nothing
        }
    }
}

echo $s;

Solution

  • You can use str_pad function:

    Pad a string to a certain length with another string

    Code:

    $re = '/\b((?:Apartment|Apt|Block|Department|Lot|Number|Villa)\s*)([0-9]+)/i'; 
    $str = "23 St John Apt 92 rer 4, Wellington Country Block 5 No value  test 54545 tt 232"; 
    $result = preg_replace_callback($re, function($m){
        return $m[1] . str_pad($m[2],4,"0", STR_PAD_LEFT);
        }, $str);
    echo $result; // <= 23 St John Apt 0092 rer 4, Wellington Country Block 0005 No value  test 54545 tt 232
    

    See demo

    I also added a \b word boundary in the beginning to make sure we match whole words only and optimized the regex a bit.