Search code examples
pythonpandasargs

Python *args returns tuple rather than pandas dataframes


Have a function that returns a series of dataframes.

def frames():
  bla bla
  return df1, df2, df3, df4

I would like to write a function that will append these frames together without my having to list the count so that I can have more or fewer frames in future

def appender(*args):
   condition goes here
       append things that are true

I'd like to be able to call it such that

appender(frames())

will return a full frame of frames that passed the condition.

Right now the frames() function returns a tuple of four frames. Is there any easy fix to unpack the tuple?

Thanks for any help!

Clem

UPDATE Here's an example

def frames():

    df1 = pd.DataFrame()

    df2 = pd.DataFrame()

    df3 = pd.DataFrame(['not', 'empty'])

    df4 = pd.DataFrame(['not', 'empty'])

    return df1, df2, df3, df4

def appender(*args):
    main_frame = pd.DataFrame()
    for arg in args:
        if arg.empty != True:
            assignment_frame = assignment_frame.append(arg)

    return assignment_frame


appender(frames())

gives


AttributeError Traceback (most recent call last) in () ----> 1 appender(frames())

in appender(*args) 2 main_frame = pd.DataFrame() 3 for arg in args: ----> 4 if arg.empty != True: 5 assignment_frame = assignment_frame.append(arg) 6

AttributeError: 'tuple' object has no attribute 'empty'


Solution

  • Your original code would kind-of work if you called it via appender(*frames()), but you would still get an error because assignment_frame should be main_frame.

    However, there is even a simpler approach. Just pass a collection of dataframes and use a list comprehension with your condition to filter them.

    Note that YOU DO NOT WANT TO BUILD DATAFRAMES BY APPENDING! This is called quadratic copy, because each time you call append a copy of the original dataframe is returned plus the newly appended dataframe. This will get very slow. See timings below.

    def appender(dataframes):
        return pd.concat([df for df in dataframes if not df.empty])  # Optional: .reset_index()
    
    
    >>> appender(frames())
           0
    0    not
    1  empty
    0    not
    1  empty
    

    Timings (concat vs append)

    df = pd.DataFrame(np.random.randn(10, 10))
    
    %timeit df2 = pd.concat([df] * 1000)
    # 10 loops, best of 3: 54.7 ms per loop
    
    %%timeit
    df3 = pd.DataFrame()
    for _ in range(1000):
        df3 = df3.append(df)
    # 1 loop, best of 3: 1.28 s per loop
    
    >>> df3.equals(df2)
    True