Search code examples
regexregex-lookaroundsregex-negation

How can I split camelCase with Regex without splitting McDonald?


Using the following pattern I can split camelCase: (\w*?[a-z]{1})([A-Z]{1})

But how can I avoid matching common names like McDonald or DeSanto?

I'm after:

Match: camelCase
Match: NewsToday
No Match: IBM
No Match: McDonalds (matches pattern above)
No Match: DeSanto   (matches pattern above)

Solution

  • You can use

    \b(?!Mc|De)(\w*?[a-z])([A-Z])
    \b(?!(?:Mc|De)[A-Z])(\w*?[a-z])([A-Z])
    

    See the regex demo #1 and regex demo #2.

    Details

    • \b - a word boundary
    • (?!Mc|De) - a negative lookahead that fails the match if there is Mc or De immediately to the right of the current location
    • (\w*?[a-z]) - Group 1: zero or more word chars as few as possible and then a lowercase letter
    • ([A-Z]) - Group 2: an uppercase letter.