Have a function that returns a series of dataframes.
def frames():
bla bla
return df1, df2, df3, df4
I would like to write a function that will append these frames together without my having to list the count so that I can have more or fewer frames in future
def appender(*args):
condition goes here
append things that are true
I'd like to be able to call it such that
appender(frames())
will return a full frame of frames that passed the condition.
Right now the frames() function returns a tuple of four frames. Is there any easy fix to unpack the tuple?
Thanks for any help!
Clem
UPDATE Here's an example
def frames():
df1 = pd.DataFrame()
df2 = pd.DataFrame()
df3 = pd.DataFrame(['not', 'empty'])
df4 = pd.DataFrame(['not', 'empty'])
return df1, df2, df3, df4
def appender(*args):
main_frame = pd.DataFrame()
for arg in args:
if arg.empty != True:
assignment_frame = assignment_frame.append(arg)
return assignment_frame
appender(frames())
gives
AttributeError Traceback (most recent call last) in () ----> 1 appender(frames())
in appender(*args) 2 main_frame = pd.DataFrame() 3 for arg in args: ----> 4 if arg.empty != True: 5 assignment_frame = assignment_frame.append(arg) 6
AttributeError: 'tuple' object has no attribute 'empty'
Your original code would kind-of work if you called it via appender(*frames())
, but you would still get an error because assignment_frame
should be main_frame
.
However, there is even a simpler approach. Just pass a collection of dataframes and use a list comprehension with your condition to filter them.
Note that YOU DO NOT WANT TO BUILD DATAFRAMES BY APPENDING! This is called quadratic copy, because each time you call append
a copy of the original dataframe is returned plus the newly appended dataframe. This will get very slow. See timings below.
def appender(dataframes):
return pd.concat([df for df in dataframes if not df.empty]) # Optional: .reset_index()
>>> appender(frames())
0
0 not
1 empty
0 not
1 empty
Timings (concat vs append)
df = pd.DataFrame(np.random.randn(10, 10))
%timeit df2 = pd.concat([df] * 1000)
# 10 loops, best of 3: 54.7 ms per loop
%%timeit
df3 = pd.DataFrame()
for _ in range(1000):
df3 = df3.append(df)
# 1 loop, best of 3: 1.28 s per loop
>>> df3.equals(df2)
True