Search code examples
pythonregexregex-groupregex-lookarounds

Need a single RegEx for 4 similar but different input patterns


I want a single regular expression to capture the following 4 inputs:

Input_1: CountryINDIA
O/P: Country, INDIA

Input_2: CountryIndia
O/P: Country, India

Input_3: CountryINDIAAustralia
O/P: Country, INDIA, Australia

Input_4: CountryIndiaAustralia
O/P: Country, India, Australia

I tried the following regex. It worked for input_3 but failed for input_1:
"(^[A-Z][a-z]+)([A-Z]\w.*(?=[A-Z]))?((?<=[A-Z])\w.*$)"

Input_3: CountryaINDIAAustralia
O/P: [('Country', 'INDIA', 'Australia')] #WORKS!!!
Input_1: CountryINDIA
O/P: ('Country', 'INDI', '') #did not work :(


Solution

  • You can use the following regex, which captures one of the following:

    • [A-Z][a-z]+ Uppercase letter followed by one or more lowercase letter
    • [A-Z]+(?![a-z]) One or more uppercase letters not followed by a lowercase letter

    See it in use here

    (?:[A-Z][a-z]+|[A-Z]+(?![a-z]))