Search code examples
phpregexpreg-replaceregexp-replace

Regex (php) to match single [ or single ] but ignore anything between [[ ]]?


I have a string, and within it there will be combinations of [ ] [[ ]] ][ but I need to replace the single [ and ] with < and > but leave alone (don't match) anything that is between [[ ]].

I thought I could do this with a regex, but I'm really struggling to get it to work because the complexity is just beyond me at the moment.

Example string:

[a] [b]  <- should replace every [ with < and every ] with > so <a> <b>

[a][b]   <- should replace every [ with < and every ] with > so <a><b>

[[abc][a][b]]  <- should not replace anything. will always start with [[ and end with ]]

So thinking about this logically, I can do it in a loop with PHP but I really want to try and use a preg_replace if possible.

The logic, as far as I can decipher is to replace [ with < and ] with > EXCEPT between a [[ and ]] but I'm not sure if I can even do that in a regex. I can make it work partially by using lookahead/lookbehind but that still then matches [ and ] between [[ and ]] (e.g. [[ [a] ]].

So far I've got

    /(?<!(^|)\[)\[[^\]\[\[]*\]/gmi

Working to spot [a] but not [[a]] but fails if I have [[a [b] c]]. At this point I'm not worried about the replacement, I just need to get the regex working to match / not match.


Solution

  • You can use

    preg_replace('~(\[\[(?:(?!\[\[|]]).|(?1))*]])(*SKIP)(*F)|\[([^][]*)]~s', '<$2>', $text)
    

    See the PHP demo and the regex demo.

    Details:

    • (\[\[(?:(?!\[\[|]]).|(?1))*]])(*SKIP)(*F) - Group 1: [[, zero or more occurrences of any char that is not a starting point of the [[ or ]] char sequences or the whole Group 1 pattern recursed, and then ]], and once the match is found, it is skipped, the new search starts at the failure location
    • | - or
    • \[([^][]*)] - a [, then zero or more chars other than [ and ] captured into Group 2, and then a ].