Search code examples
pythonpython-3.xpandascamelcasing

Camel Casing and Underscore addition in filename with text, number and date


Relatively new with python and pandas, hence need some inputs here. Appreciate some response here. I'm having multiple files with a filename having a combination of text, number and date. I want to have camel casing with an underscore and trimming of white space to a standard format, for eg,

FileName- ARA Inoc Start Times V34 20200418.xlsx to be named as Ara_Inoc_Start_Time_V34_20200418.xlsx

FileName- Batch Start Time V3 20200418.xlsx to be named as Batch_Start_Time_V3_20200418.xlsx

The challenge I'm facing is 1) how to add an underscore before date? 2) with a word in a filename like ARA Inoc Start - my code converts it to A_R_A _Inoc _Start. How to adapt it to Ara_Inoc? this would involve trimming the white space as well. How to add it in current code.

def change_case(str): 
    res = [str[0].upper()] 
    for c in str[1:]: 
        if c in ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'): 
            res.append('_') 
            res.append(c.upper()) 
        else: 
            res.append(c) 

    return ''.join(res) 

# Driver code 

for filename in os.listdir("C:\\Users\\t\\Documents\\DummyData\\"):
    str = filename
    print(change_case(str)) 

Solution

  • Split the strings using str.split(), convert the first letter using str.upper(), then join them using str.join()

    import os
    for filename in [
        ' ARA Inoc Start Times V34 20200418.xlsx  ', 
        ' Batch_Start_Time_V3_20200418.xlsx '
    ]:  #  os.listdir('C:\\Users\\t\\Documents\\DummyData\\')
        new_filename = '_'.join([i[:1].upper()+i[1:].lower() for i in filename.strip().split()])
        print(new_filename) 
    
    

    Output:

    Ara_Inoc_Start_Times_V34_20200418.xlsx
    Batch_start_time_v3_20200418.xlsx
    

    Note the use of i[:1].upper()+i[1:] instead of str.title(). You can use the latter, but that will convert the file extension to title case as well, hence why I used the above instead. Alternatively, you can split the filename and the extension before doing the conversion:

    import os
    for filename in[
        ' ARA Inoc Start Times V34 20200418.xlsx  ', 
        ' Batch_Start_Time_V3_20200418.xlsx '
    ]:
        filename, ext = filename.rsplit('.', 1)
        filename = '_'.join([i.title() for i in filename.strip().lower().split()])
        new_filename = '.'.join([filename, ext])
        print(new_filename) 
    

    Output:

    Ara_Inoc_Start_Times_V34_20200418.xlsx  
    Batch_Start_Time_V3_20200418.xlsx