There are two dataframes:
df1
name | foo | bar | |
---|---|---|---|
0 | value1 | value2 | value3 |
1 | value4 | value5 | value6 |
2 | value7 | value8 | value9 |
3 | value10 | value11 | value12 |
4 | value13 | value14 | value15 |
df2
name | foo | bar | |
---|---|---|---|
0 | value10 | value20 | value30 |
1 | value40 | value50 | value60 |
2 | value70 | value80 | value90 |
3 | value100 | value110 | value120 |
4 | value130 | value140 | value150 |
Tell me how to create dictionaries from two dataframes. And finally concat, as in the example below? In a real dataframes there are 10000 rows, 100 columns. Dataframes are generated in real time in a loop. Most likely, I can not add them all at once to the list. I can only gradually add them to each other in iteration. If it is optimal to use only incremental concat, then how to avoid constant copying of frames?
name | foo | bar | |
---|---|---|---|
0 | value1 | value2 | value3 |
1 | value4 | value5 | value6 |
2 | value7 | value8 | value9 |
3 | value10 | value11 | value12 |
4 | value13 | value14 | value15 |
0 | value10 | value20 | value30 |
1 | value40 | value50 | value60 |
2 | value70 | value80 | value90 |
3 | value100 | value110 | value120 |
4 | value130 | value140 | value150 |
I am based on the article: Why does concatenation of DataFrames get exponentially slower? and benchmark: https://perfpy.com/16#/
I have tried the following without success. The problem is that I do not really understand how to construct a dictionary from a dataframe correctly in my case.
rows = []
df_a = df1
df_a = df_a.to_dict('dict')
rows.append(df_a)
df_a = df2
df_a = df_a.to_dict('dict')
rows.append(df_a)
df = pd.DataFrame(rows)
As a result, I want to make a cycle of 40 dataframes, similar to df1 and df2, with different values. And finally add the dictionaries into one, making a dataframe out of it
I would do it like this:
list_dfs = []
for df in [df1, df2]:
list_dfs.append(df)
df_output = pd.concat(list_dfs)
As opposed to doing something like this (this is bad and memory intensive and slow for large numbers of dataframes):
df_out = pd.DataFrame()
for df in [df1, df2]:
df_out = pd.concat([df_out, df])
#or
df_out.append(df)