Usually to match complete words we use \b
as word delimiter, but when we are dealing with a compound world including punctuation, this method does not work quite well. For instance, suppose the following string:
basic school co-operative limited
If we apply the following regex we get co-operative
and limited
as expected. This happens due to the order in the alternators:
\b(co-operative|co|co.|limited)\b
What happens if I do not have any control over the order of regex alternators and I get the following regex?
\b(co|co.|co-operative|limited)\b
In this scenario, just co limited
would match instead of co-operative limited
. Do we have any way to solve the problem in the order in the alternations?
Thanks for your priceless help
Since you want to match complete words, you could change the \b
assertion at the end of the regex to a positive lookahead for whitespace or the end of the string e.g.
\b(co|co.|co-operative|limited)(?=\s|$)
If you wanted to allow for certain punctuation after a word, you could add that into the lookahead, e.g.
\b(co|co.|co-operative|limited)(?=[\s.]|$)