Search code examples
.netregexbalancing-groups

Regex to match escapable strings?


I wrote this regex to match strings:

(?>(?<Quote>""|').*?(?<!\\)\k<Quote>)

i.e., some text enclosed in quotes. It also supports escaping, so it will match "hello\"world" in its entirety without stopping at the first quote, like I want. But I forgot about double-escaping. "hello\\"world" is not valid, for example.

I'm pretty sure this is possible to fix with balancing groups, but I've never really used them before. Anyone know how to write this?


Solution

  • Regular expressions are not meant to be used for escaped constructs.

    I don't think it's possible to do this in any "nice" kind of way (if at all), although I'll post an edit if I figure out otherwise.

    Balancing group definitions are for nested constructs. Nesting doesn't happen in strings, so balancing group definitions don't seem to even be the right tool for this.


    Edit 1:

    It depends on how many features you're looking for. If you simply want to match the next escaped quotation, you can use the pattern

    ^"([^\\\"]|\\.)*"
    

    which, when escaped for code, turns out like

    "^\"([^\\\\\\\"]|\\\\.)*\""
    

    to match something like

    "Hello! \" Hi! \" "
    

    but as soon as you start adding more complicated requirements like Unicode escapes, it becomes a lot more tedious. Just do it by hand, it should be much simpler.


    Edit 2:

    If you're curious about how balancing group definitions work anyway, I recommend reading page 430 of this book (34 in pdf).