Search code examples
regexregex-lookaroundslookbehind

Trying to match string A if string B is found anywhere before it


What I'm trying to do is, if a string consists of some substring that starts with "!" encapsulated in "[" and "]", to separate those brackets from the rest of the string via a space, e.g. "[!foo]" --> "[ !foo ]", "[!bar]" --> "[ !bar ]", etc. Since that substring can be variable length, I figured this had to be done with regex. My thought was to do this in two steps - first separate the first bracket, then separate the second bracket.

The first one isn't hard; the regex is just \[! and so I can just do str = str.replace(/\[!/g, "[ !"); in Javascript. It's the second part I can't get to work.

Because now, I need to match "]" if the string literal "[ !" is found anywhere before it. So a simple positive lookbehind doesn't match because it only looks directly behind: (?<=\Q[ !\E)\] doesn't match.

And I still don't understand why, but I'm not allowed to make the positive lookbehind non-fixed length; (?<=\Q[ !\E.*)\] throws the error Syntax Error: Invalid regular expression: missing / in the console, and this regex debugger yields a pattern error explaining "A quantifier inside a lookbehind makes it non-fixed width".

Putting a non-capturing group of non-fixed width between the lookbehind and the capturing group doesn't work; (?<=\Q[ !\E)(?:.*)\] doesn't match.

One thing that won't work is just trying to match "[ !" at the start of the string, because this whole "[!foo]" string is actually itself a substring of an even bigger string and isn't at the beginning.

What am I missing?


Solution

  • Using 2 positive lookarounds, you can assert what is on the left is an opening square bracket (?<=\[)

    Then match any char except ] using a negated character class ![^[\]]+ preceded by an exclamation mark and assert what is on the right is a closing square bracket using (?=])

    Note that in Javascript the lookbehind is not yet widely supported.

    (?<=\[)![^[\]]+(?=])
    

    In the replacement use the matched substring $&

    Regex demo

    [
      "[!foo]",
      "[!bar]"
    ].forEach(s =>
      console.log(s.replace(/(?<=\[)![^[\]]+(?=])/g, " $& "))
    )


    Or you could also use 3 capturing groups instead:

    (\[)(![^\]]+)(\])
    

    In the replacement use

    $1 $2 $3
    

    Regex demo

    [
      "[!foo]",
      "[!bar]"
    ].forEach(s =>
      console.log(s.replace(/(\[)(![^\]]+)(\])/g, "$1 $2 $3"))
    )