Search code examples
pythonpandasdataframeenumerate

Enumerate in 'for' function


I want to add a column named X in every .tsv file. I want this column to have the value of the corresponding index in the folder_names list (one value of the folder_names per .tsv file). But the enumerate function repeat itself in each iteration of the for loop, so the column 'X' always get the last value of names instead of the corresponding one.

I got these two lists:

all_files_tsv = [tsv_file_1, tsv_file_2.... tsv_file_n]

folder_names = [folder_name_1, folder_name_2.... folder_name_n]

And the output desired is the following:

tsv_file_1:

Column1 Column2 X
1 A folder_name_1
2 B folder_name_1
3 C folder_name_1

tsv_file_2:

Column1 Column2 X
1 --- folder_name_2
2 --- folder_name_2
3 --- folder_name_2

And this is the code that I have right now:

for file_ in all_files_tsv:
    df = pd.read_csv(file_,sep = '\t', header=0)
    for index, names in enumerate(folder_names):
        df['X'] = names

Any idea of how could I solve this?


Solution

  • You don't need enumerate(). You can iterate over all_files_tsv and folder_names in parallel using zip() to get corresponding elements.

    for file, name in zip(all_files_tsv, folder_names):
        df = pd.read_csv(file, sep='\t', header=0)
        df['X'] = name
        df.to_csv(file, sep='\t', header=0)
    

    Nested loops are used when you want a cross product between two lists, zip() is used when you want to pair corresponding elements.