Data science roadblock here... I need to rename 972 files according to a .csv file that has multiples attributes of those 972 files.
They share one thing in common which are the values of Image Data ID
column.
In the file name this number (6 digits) is present in the last part of the name right before ".nii"
I have loaded the .csv file into a Pandas datafreme. Here's an example of what it looks like:
Image Data ID Subject Group Visit Description
516 277576 027_S_2245 EMCI 4 ACCELERATED SAG IR-SPGR
525 342645 027_S_2183 EMCI 4 ACCELERATED SAG IR-SPGR
1 292394 131_S_0123 CN 26 Accelerated SAG IR-SPGR
3 475763 131_S_0123 CN 32 Accelerated SAG IR-SPGR
4 413872 131_S_0123 CN 30 Accelerated SAG IR-SPGR
Perhaps more understandable in an image format:
The filenames are listed in a list, done with:
files = os.listdir("path/to/files")
Here's an example of what the file names are like:
ADNI_098_S_4215_MR_Sag_IR-SPGR__br_raw_20130206130502189_10_S173103_I343697.nii
ADNI_094_S_2201_MR_Accelerated_SAG_IR-SPGR__br_raw_20120119112855332_188_S137442_I279199.nii
ADNI_127_S_4240_MR_Sag_IR-SPGR__br_raw_20120925151831011_194_S168683_I336697.nii
Thus, in essence what I want to do is identify a file in its respective .CSV row through the value of Image Data ID
and rename it like:
EMCI_027_S_2245_4_Accelerated.nii
or
CN_134_S_0233_32_Normal.nii
(Depending if Description
has the word Accelerated or not)
Any suggestions on how to approach this?
From what i can see, here's what I would suggest :
-Make a function that extracts the image_data_id from the old_name of the file (If all your id are of the same size(size 6 from what i can see here), you can do name_file[-11,-5] to extract the id from a string called name_file, if not, you can do regex(find the number between I and .nii)). Let's call that function id_from_file_name.
-Now let's call df_id the dataframe obtained by loading your csv file. Do
import os
df_calc = df_id.set_index('Image_Data_Id')
def new_name_file(old_name):
id = id_from_file_name(old_name)
# Edit : Related to first comment, changed id to int(id)
if int(id) not in df_calc.index:
print( id, 'not in dataframe')
return None
if 'accelerated' in df_calc.loc[id, 'Description'].lower():
type = 'Accelerated'
else:
type = 'Normal'
return(f'{df_calc.loc[id,'Group']}_{df_calc.loc[id,'Subject']}_{df_calc.loc[id,'Visit']}_{type}.nii')
list_name = os.listdir(path)
for i in list_name:
new_name = new_name_file(i)
if new_name is not None:
os.rename(path/old_name, path/dict_name(old_name))
Edit : To create that function id_from_file_name, you can do :
import re
def id_from_file_name(name):
return(re.search(r'I(.*?).nii', name).group(1))