Search code examples
pythonpandasfunctools

Reduce iterating over a zip list of dataframes


Consider the following code, which uses functools.reduce to concatenate a list of dataframes:

from functools import reduce
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})
df3 = pd.DataFrame({'C': [5, 6]})

reduce(lambda x, y: pd.concat([x, y], axis=1), [df1, df2, df3])

This code works well. However, when I try the following, I get errors:

reduce(lambda x, y: pd.concat([x[0], y[0]], axis=1), zip([df1, df2, df3], [0, 1, 0]))

Could someone please help me to understand that?


Solution

  • Let's understand what's going on in reduce:

    # Iteration 1: 
    # x = (df1, 0); y = (df2, 1)
    # reduce(x, y): pd.concat([x[0], y[0]], axis=1) # okay
    # Now the result of `reduce(x, y)` is a dataframe which will be used as new x for iteration 2
    
    # Iteration 2: 
    # x = some_dataframe, y = (df3, 0)
    # reduce(x, y): pd.concat([x[0], y[0]], axis=1) # error
    # Notice that x is not a tuple anymore but a dataframe instead.
    # So calling dataframe[0] will raise an key error because there is no such column in the dataframe
    

    In case you are interested in a implementation of reduce, here is the minimal implementation:

    def reduce(func, sequence):
        if not sequence:
            raise TypeError('Empty sequence')
    
        result = sequence[0]
        for item in sequence[1:]:
            result = func(result, item)
        
        return result