Search code examples
pythonnested-loops

Python iterate N nested for loops?


I have a CSV file with N key columns, and a column with expressions containing references to 1 to N of the key columns which I want replaced with the values in each key column for that row. Hopefully the example below clarifies what I mean.

Key columns below are A,B,C

Table

Desired output:

20_A
20_B
30_A
30_B
40_C_4
40_C_5

My solution:

keys = ['Age','Type','Delay']
df = pd.read_csv(csv_path)
for index, row in df.iterrows():

    key1_list = row[keys[0]].split(",")
    key2_list = row[keys[1]].split(",")
    key3_list = row[keys[2]].split(",")

    expression = row['Expression']

    # Iterate over all combinations of key column values and export a chart for each one
    for KEY1 in key1_list:
        for KEY2 in key2_list:
            for KEY3 in key3_list:
                string = expression
                string = string.replace("<" + keys[0] + ">", KEY1)
                string = string.replace("<" + keys[1] + ">", KEY2)
                string = string.replace("<" + keys[2] + ">", KEY3)
                print(string)

However I would like to generalize my code to work for any number of key columns and only needing to update the keys list at the beginning. This would require looping to depth len(keys). But I can't figure out how to generalize looping to any depth with flat code, I looked at itertools but couldn't find what I need. I think recursion might work but I prefer to avoid that.


Solution

  • Recursion could of course fix the problem for you, but you should take another look in itertools before going down that path. What you want is a product of your keys, to generate all possible combinations of keys.

    One way of achieving this is as follows:

    import pandas as pd
    import itertools
    
    csv_path = "path/to/file"
    df = pd.read_csv(csv_path)
    
    # Find available keys from data frame instead of manually input it:
    keys = list(df.keys()[:-1]) # Do not include "Expression" as it is not a key.
    for index, row in df.iterrows():
    
        # Add list of keys to a list of lists
        # (The order needs to be preserved, therefore avoiding dict)
        key_list = []
        for key in keys:
            # The code uses ',' as value separator in each cell.
            # Does this work in a csv file?
            key_list.append(list(row[key].split(',')))
    
        expression = row['Expression']
    
        # All key combinations are then generated with 'itertools.product'
        combos = itertools.product(*key_list)
    
        # Each combo is then handled separately
        for combo in combos:
            string = expression
            # Replace each key in order
            # Must be done sequentially since depth is not known/variable
            for key, value in zip(keys, combo):
                string = string.replace('<' + key + '>', value)
            print(string)
    

    Hopefully this code is understandable and does what you want it to. Otherwise please let me know and I'll try to clarify further.