Search code examples
regexregex-lookaroundsregex-group

Regex: Match pattern unless preceded by pattern containing element from the matching character class


I am having a hard time coming up with a regex to match a specific case:

This can be matched: any-dashed-strings this-can-be-matched-even-though-its-big

This cannot be matched: strings starting with elem- or asdf- or a single - elem-this-cannot-be-matched asdf-this-cannot-be-matched -

So far what I came up with is:

/\b(?!elem-|asdf-)([\w\-]+)\b/

But I keep matching a single - and the whole -this-cannot-be-matched suffix. I cannot figure it out how to not only ignore a character present inside the matching character class conditionally, and not matching anything else if a suffix is found

I am currently working with the Oniguruma engine (Ruby 1.9+/PHP multi-byte string module).

If possible, please elaborate on the solution. Thanks a lot!


Solution

  • If a lookbehind is supported, you can assert a whitespace boundary to the left, and make the alternation for both words without the hyphen optional.

    (?<!\S)(?!(?:elem|asdf)?-)[\w-]+\b
    

    Explanation

    • (?<!\S) Assert a whitespace boundary to the left
    • (?! Negative lookahead, assert the directly to the right is not
      • (?:elem|asdf)?- Optionally match elem or asdf followed by -
    • ) Close the lookahead
    • [\w-]+ Match 1+ word chars or -
    • \b A word boundary

    See a regex demo.

    Or a version with a capture group and without a lookbehind:

    (?:\s|^)(?!(?:elem|asdf)?-)([\w-]+)\b
    

    See another regex demo.