I want to add a column named X
in every .tsv
file. I want this column to have the value of the corresponding index in the folder_names
list (one value of the folder_names
per .tsv file). But the enumerate
function repeat itself in each iteration of the for loop
, so the column 'X' always get the last value of names
instead of the corresponding one.
I got these two lists:
all_files_tsv = [tsv_file_1, tsv_file_2.... tsv_file_n]
folder_names = [folder_name_1, folder_name_2.... folder_name_n]
And the output desired is the following:
tsv_file_1
:
Column1 | Column2 | X |
---|---|---|
1 | A | folder_name_1 |
2 | B | folder_name_1 |
3 | C | folder_name_1 |
tsv_file_2
:
Column1 | Column2 | X |
---|---|---|
1 | --- | folder_name_2 |
2 | --- | folder_name_2 |
3 | --- | folder_name_2 |
And this is the code that I have right now:
for file_ in all_files_tsv:
df = pd.read_csv(file_,sep = '\t', header=0)
for index, names in enumerate(folder_names):
df['X'] = names
Any idea of how could I solve this?
You don't need enumerate()
. You can iterate over all_files_tsv
and folder_names
in parallel using zip()
to get corresponding elements.
for file, name in zip(all_files_tsv, folder_names):
df = pd.read_csv(file, sep='\t', header=0)
df['X'] = name
df.to_csv(file, sep='\t', header=0)
Nested loops are used when you want a cross product between two lists, zip()
is used when you want to pair corresponding elements.