Search code examples
regexpcre

PCRE Regex - Match only brackets excluding enclosed content


I'm trying to match a pair of special characters, while excluding the enclosed content from the match. For example, ~some enclosed content~ should match only the pair of ~ and leave out some enclosed content entirely. I can only use vanilla PCRE, and capture groups aren't an option.

My match criteria for the entire string is ~([^\s].*?(?<!\s))~. Matching the first and second ~ separately would also be acceptable.


Solution

  • Looking at your pattern, you want a non whitespace char right after the opening ~ and a non whitespace char right before the closing ~

    As those are the delimiters, and the non whitespace char should also not be ~ itself, you might use:

    ~(?=[^~\s](?:[^~\r\n]*[^\s~])?~)|(?<=~)[^\s~](?:[^~\r\n]*[^\s~])?\K~
    

    Explanation

    • ~ Match literally
    • (?= Positive lookahead, assert that to the right is
      • [^~\s] Match a non whitespace char except for ~
      • (?: Non capture group
        • [^~\r\n]*[^\s~] Match repeating any char other than a newline or ~ followed by a non whitespace char except for ~
      • )? Close non capture group and make it optional (to also match a single char ~a~)
      • ~ Match literally
    • ) Close the lookahead
    • | Or
    • (?<=~) Positive lookbehind, assert ~ to the left
    • [^\s~] Match a non whitespace char except for ~
    • (?:[^~\r\n]*[^\s~])? Same optional pattern as in the lookahead
    • \K Forget what is matched so far
    • ~ Match literally

    Regex demo