Search code examples
pythonregexuppercaselowercase

Split string at upper/lower case boundaries


I would like to split the following string at the upper/lower-case boundaries. How might I do this in Python and/or with a regex?

For example,

x = 'aagaaggagatataccATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaa'

split_boundaries(x) = ['aagaaggagatatacc', 
                       'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA',
                       'gaaaacctgtacaa']

Solution

  • Use re.findall:

    import re
    x = 'aagaaggagatataccATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaa'
    
    re.findall(r'[a-z]+|[A-Z]+', x)
    # ['aagaaggagatatacc', 'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA', 'gaaaacctgtacaa']