Search code examples
phpregexpreg-replacepreg-matchpreg-match-all

Delete non number containing braces from string


A string may contain non-number in braces which needs to be deleted. Input can be multiline too, for example :
input:

# (1437)  
# {19} (917) (1437)  
# {19} (1437) [1449] [2474] [2098] [1351] [508]   [969]( MONDAY) [sunday]

desired output:

# (1437)  
# {19} (917) (1437)  
# {19} (1437) [1449] [2474] [2098] [1351] [508] [969]

However the below regex instead of removing just ( MONDAY) [sunday] removes the last line completely.

$re = '/^(#\h*)((\[\d+]|\{\d+}|\(\d+\))(?:\h+(?3))*)$/m';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo implode("\n", array_map(fn($x) => $x[1] . implode(' ', array_unique(explode(' ', $x[2]))), $matches));

Solution

  • In php (PCRE) flavor you may use this regex with conditional construct:

    (?:(\()|({)|\[)\D*(?(1)\)|(?(2)}|]))\s*
    

    RegEx Demo

    RegEx Details:

    • (?:: Start non-capture group
      • (\(): Match ( and capture in group #1
      • |: OR
      • ({): Match { and capture in group #2
      • |: OR
      • \[: Match a [
    • ): non-capture group
    • \D*: Match 0 or more non-digits
    • (?(1)\): If group #1 is present then match )
      • |: OR
      • (?(2)}|]): If group #2 is present then match } else match ]
    • ): End of conditional construct
    • \s*: Match 0 or more whitespaces

    Code:

    $r = preg_replace('/(?:(\()|({)|\[)\D*(?(1)\)|(?(2)}|]))\s*/', '', $s);
    

    A bit more efficient version of above regex (avoids backtracking):

    (?:(\()|({)|\[)(?(1)[^)\d]*\)|(?(2)[^}\d]*}|[^]\d]*]))\s*
    

    RegEx Demo 2