Search code examples
c#regexcamelcasing

Splitting CamelCase with regex


I have this code to split CamelCase by regular expression:

Regex.Replace(input, "(?<=[a-z])([A-Z])", " $1", RegexOptions.Compiled).Trim();

However, it doesn't split this correctly: ShowXYZColours

It produces Show XYZColours instead of Show XYZ Colours

How do I get the desired result?


Solution

  • Unicode-aware

    (?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})
    

    Breakdown:

    (?=               # look-ahead: a position followed by...
      \p{Lu}\p{Ll}    #   an uppercase and a lowercase
    )                 #
    |                 # or
    (?<=              # look-behind: a position after...
      \p{Ll}          #   an uppercase
    )                 #
    (?=               # look-ahead: a position followed by...
      \p{Lu}          #   a lowercase
    )                 #
    

    Use with your regex split function.


    EDIT: Of course you can replace \p{Lu} with [A-Z] and \p{Ll} with [a-z] if that's what you need or your regex engine does not understand Unicode categories.