Search code examples
python-3.xpandascsvrename

How to rename files according to a .CSV map


Data science roadblock here... I need to rename 972 files according to a .csv file that has multiples attributes of those 972 files.

They share one thing in common which are the values of Image Data ID column. In the file name this number (6 digits) is present in the last part of the name right before ".nii"

I have loaded the .csv file into a Pandas datafreme. Here's an example of what it looks like:

    Image Data ID   Subject Group   Visit   Description
516 277576  027_S_2245  EMCI    4   ACCELERATED SAG IR-SPGR
525 342645  027_S_2183  EMCI    4   ACCELERATED SAG IR-SPGR
1   292394  131_S_0123  CN  26  Accelerated SAG IR-SPGR
3   475763  131_S_0123  CN  32  Accelerated SAG IR-SPGR
4   413872  131_S_0123  CN  30  Accelerated SAG IR-SPGR

Perhaps more understandable in an image format:

enter image description here

The filenames are listed in a list, done with:

files = os.listdir("path/to/files")

Here's an example of what the file names are like:

ADNI_098_S_4215_MR_Sag_IR-SPGR__br_raw_20130206130502189_10_S173103_I343697.nii
ADNI_094_S_2201_MR_Accelerated_SAG_IR-SPGR__br_raw_20120119112855332_188_S137442_I279199.nii
ADNI_127_S_4240_MR_Sag_IR-SPGR__br_raw_20120925151831011_194_S168683_I336697.nii

Thus, in essence what I want to do is identify a file in its respective .CSV row through the value of Image Data ID and rename it like:

EMCI_027_S_2245_4_Accelerated.nii

or

CN_134_S_0233_32_Normal.nii

(Depending if Description has the word Accelerated or not)

Any suggestions on how to approach this?


Solution

  • From what i can see, here's what I would suggest :

    -Make a function that extracts the image_data_id from the old_name of the file (If all your id are of the same size(size 6 from what i can see here), you can do name_file[-11,-5] to extract the id from a string called name_file, if not, you can do regex(find the number between I and .nii)). Let's call that function id_from_file_name.

    -Now let's call df_id the dataframe obtained by loading your csv file. Do

    import os
    df_calc = df_id.set_index('Image_Data_Id')
    
    def new_name_file(old_name):
        id = id_from_file_name(old_name)
        # Edit : Related to first comment, changed id to int(id)
        if int(id) not in df_calc.index:
            print( id, 'not in dataframe')
            return None
        if 'accelerated' in df_calc.loc[id, 'Description'].lower():
            type = 'Accelerated'
        else:
            type = 'Normal'
        return(f'{df_calc.loc[id,'Group']}_{df_calc.loc[id,'Subject']}_{df_calc.loc[id,'Visit']}_{type}.nii')
    
    list_name = os.listdir(path)
    for i in list_name:
        new_name = new_name_file(i)
        if new_name is not None:
            os.rename(path/old_name, path/dict_name(old_name))
    

    Edit : To create that function id_from_file_name, you can do :

    import re
    def id_from_file_name(name):
        return(re.search(r'I(.*?).nii', name).group(1))