Search code examples
pythonstringlistcamelcasing

How to separate a irregularly cased string to get the words? - Python


I have the following word list.

as my words are not all delimited by capital latter. the word list would consist words such as 'USA' , I am not sure how to do that. 'USA' should be as a one word. cannot be separated.

myList=[u'USA',u'Chancellor', u'currentRank', u'geolocDepartment', u'populationUrban', u'apparentMagnitude', u'Train', u'artery',
       u'education', u'rightChild', u'fuel', u'Synagogue', u'Abbey', u'ResearchProject', u'languageFamily', u'building',
       u'SnookerPlayer', u'productionCompany', u'sibling', u'oclc', u'notableStudent', u'totalCargo', u'Ambassador', u'copilote',
       u'codeBook', u'VoiceActor', u'NuclearPowerStation', u'ChessPlayer', u'runwayLength', u'horseRidingDiscipline']

How to edit the element in the list.
I would like to get change the element in the list as below shows:

 updatemyList=[u'USA',u'Chancellor', u'current Rank', u'geoloc Department', u'population Urban', u'apparent Magnitude', u'Train', u'artery',
           u'education', u'right Child', u'fuel', u'Synagogue', u'Abbey', u'Research Project', u'language Family', u'building',
           u'Snooker Player', u'production Company', u'sibling', u'oclc', u'notable Student', u'total Cargo', u'Ambassador', u'copilote',
           u'code Book', u'Voice Actor', u'Nuclear Power Station', u'Chess Player', u'runway Length',  u'horse Riding Discipline']

the word is able to separate


Solution

  • You could use re.sub

    import re 
    
    first_cap_re = re.compile('(.)([A-Z][a-z]+)')
    all_cap_re = re.compile('([a-z0-9])([A-Z])')
    
    
    def convert(word):
        s1 = first_cap_re.sub(r'\1 \2', word)
        return all_cap_re.sub(r'\1 \2', s1)
    
    
    updated_words = [convert(word) for word in myList]
    

    Adapated from: Elegant Python function to convert CamelCase to snake_case?