Search code examples
regexsublimetext3

How to capture a word when it's not followed by hyphens, underscores, and alphanumeric


How do I capture a word "entity", when it's not followed by hyphens, underscores, and alphanumeric, and ignores anything else that follows it?

For example, I want to capture the word "entity" in the following situations:

  • entity
  • entity,
  • [entity]

But I do NOT want it to capture the word in the following situations:

  • entity-foo
  • entity_bar
  • entityfoobar
  • entity0foo

The furthest I got to is:

(entity)[^-\$a-zA-Z_0-9]

However, the above regex identifies:

  • entity, without ignoring the ,
  • entity] without ignoring ]

I'm trying to capture this token in a Sublime Syntax definition.


Solution

  • Sounds like a job for lookaheads!

    Something like this should work:

    (entity)(?=[\s,\]])
    

    Explanation:

    • (?<=\[)?: The (?<=regex) construct is a lookbehind. We make it optional by using a trailing ?. This lookbehind looks for a [ character in front of our regex
    • (entity): Matching the phrase entity and capturing it
    • (?=[\s,\]]): A lookahead ((?=regex)), looking for any of \s, , and ]. \s in RegEx matches a whitespace character, which includes spaces, tabs, newlines, etc.

    One caveat of my pattern is that the phrase entity] will be matched, without the leading [, which isn't specified in your examples. This can potentially be expanded further, but it will begin to get messy, and may not be necessary, anyway.