Search code examples
pythonpandasdataframemultiple-columnsmulti-index

Looping over dataframe using multiple columns throwing ValueError


I am trying to apply for loops inside a Pandas dataframe to access two columns at a time. My piece of code works perfectly for a single column. But when applying to multiple columns, it is throwing : "ValueError : too many values to unpack (expected 2)"

My code snippet is as follows -

for col1, col2 in df.columns:
    if col1.startswith('ColumnName1') and col2.startswith('ColumnName2') and df[col2].notnull()*1:
        new_df = df.groupby([col1, col2]).agg({'ColumnName3': 'unique'}).reset_index()
    elif col1.startswith('ColumnName1') and col2.startswith('ColumnName2') and not df[col2].notnull()*1:
        new_df = df.groupby(col1).agg({'ColumnName3': 'unique'}).reset_index()

The small problem is the column names are too large and not under control, because this dataframe has multiheader columns, so after merging they are creating some random filling names. Hence the ".startswith". The column names are much larger.

I am trying to perform a groupby of column 3 based on columns 1 and 2, if column 2 is not null, else a groupby using column1 when column 2 is null.

Can anyone tell me where am I wrong here, or what am I missing here?


Solution

  • If you want to loop over df.columns two values at a time, you can do so like this:

    import itertools as it
    
    for col1, col2 in it.zip_longest(*[iter(df.columns)]*2):
        ...
    

    We create a list containing an iterator over the columns, and then apply *2 to get a list with two of that same iterator object. Because it's the same iterator, ever time we iterate over one we also advance the other one. We then zip them together to go from a sequence of 2 items of length len(df.columns) // 2, to a sequence with len(df.columns) // 2 tuples of 2 items.