Search code examples
javaregexpattern-matchingcamelcasing

Regex pattern to match many combinations for a camel case string


I'm trying to create a regex pattern (one or more). For instance having SomeCamelStringToCombine it should match following substrings:

Some, Camel, String, To, Combine, SomeCamel, SomeCamelString,SomeCamelStringTo, SomeCamelStringToCombine, CamelString, CamelStringTo, CamelStringToCombine, StringTo, StringToCombine, ToCombine.

I managed to create this pattern: /(?=([\p{Lu}]+[\p{L}]+))/, but it matches

SomeCamelStringToCombine, CamelStringToCombine, StringToCombine, ToCombine, Combine.

I don't know whether I should modify it or create extra patterns. The problem is I do not know how. I'm using Java for a matching.

Can I ask you for help or tips?


Solution

  • You could make a fixed size regex to find up to that many word combinations.
    Below uses 5 words worth of captures, but you could extend it to any size.

    You could easily create the regex programmatically.

    Just exclude empty capture groups from the array.

    Note, after the first match, you can also exclude the 1-5 groups to avoid
    duplicate singles.

    (?=([A-Z][a-z]+)([A-Z][a-z]+)([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?)(?=(\1\2))(?=(\6\3)?)(?=(\7\4)?)(?=(\8\5)?)\1

    https://regex101.com/r/ta9Qzq/1

     (?=
          ( [A-Z] [a-z]+ )              # (1), required Word 1
          ( [A-Z] [a-z]+ )              # (2), required Word 2
          ( [A-Z] [a-z]+ )?             # (3), optional Word 3
          ( [A-Z] [a-z]+ )?             # (4), optional Word 4
          ( [A-Z] [a-z]+ )?             # (5), optional Word 5
     )
     (?=
          ( \1 \2 )                     # (6), required Word 1,2
     )
     (?=
          ( \6 \3 )?                    # (7), optional Word 1,2,3
     )
     (?=
          ( \7 \4 )?                    # (8), optional Word 1,2,3,4
     )
     (?=
          ( \8 \5 )?                    # (9), optional Word 1,2,3,4,5
     )
     \1                            # Advance position by 1 word