Search code examples
javaregexsplitnumberscamelcasing

Split camel case text with number groups


I have strings containing camel case text and numbers and would like to split it.

E.g. the string "abcDefGhi345J6" should be split into

["abc", "Def", "Ghi", "345", "J", "6"]

My best effort is

"abcDefGhi345J6".split("(?=\\p{Lu})|(?!\\p{Lu})(?=\\d+)")

which gives me

["abc", "Def", "Ghi", "3", "4", "5", "J", "6"]

PS: Dupe marked answers are NOT giving expected output as those are are not Unicode agnostic.


Solution

  • You may use this regex for splitting:

    (?=\p{Lu})|(?<!\d)(?=\d)
    

    RegEx Demo

    For Java code:

    String[] arr = string.split("(?=\\p{Lu})|(?<!\\d)(?=\\d)");
    

    (?<!\d)(?=\d) will find a position that has a digit ahead but there is no digit behind that position.