I have a list of dataframes. Each dataframe was originally numerical data taken from which are all shaped identically with 21 rows and 5 columns. The first column is an index (index 0 to index 20). I want to compute the average (mean) values into a single dataframe. Then I want to export the dataframe to excel.
Here's a simplified version of my existing code:
#look to concatenate the dataframes together all at once
#dataFrameList is the given list of dataFrames
concatenatedDataframes = pd.concat(dataFrameList, axis = 1)
#grouping the dataframes by the index, which is the same across all of the dataframes
groupedByIndex = concatenatedDataframes.groupby(level = 0)
#take the mean
meanDataFrame = groupedByIndex.mean()
# Create a Pandas Excel writer using openpyxl as the engine.
writer = pd.ExcelWriter(filepath, engine='openpyxl')
meanDataFrame.to_excel(writer)
However, when I open the excel file, I see what looks like EVERY dataframe is copied into the sheet and the average/mean values are not shown. A simplified example is shown below (cutting most of the rows and dataframes)
Dataframe 1 Dataframe 2 Dataframe 3
Index Col2 Col3 Col4 Col5 Col2 Col3 Col4 Col5 Col2 Col3 Col4 Col5
0 Data Data Data Data Data Data Data Data Data Data Data Data
1 Data Data Data Data Data Data Data Data Data Data Data Data
2 Data Data Data Data Data Data Data Data Data Data Data Data
....
I'm looking for something more like:
Averaged DF
Index Col2 Col3 Col4
0 Mean Index0,Col2 across DFs Mean Index0,Col3 across DFs Mean Index0,Col4 across DFs
1 Mean Index1,Col2 across DFs Mean Index1,Col3 across DFs Mean Index1,Col4 across DFs
2 Mean Index2,Col2 across DFs Mean Index2,Col3 across DFs Mean Index3,Col4 across DFs
...
I have also already seen this answer: Get the mean across multiple Pandas DataFrames
If possible, I'm looking for a clean solution, not one which would simply involve looping through each dataFrame value by value. Any suggestions?
Perhaps I misunderstood what you asked
The solution is simple. You just need to concat along the correct axis
df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[10 + i * j for j in range(columns)] for i in range(rows) ])
df2 = df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[i + j for j in range(columns)] for i in range(rows) ])
ps. this should be your job as OP
df_concat0 = pd.concat((df1, df2), axis=1)
puts all the dataframes next to eachother.
0 1 0 1
0 10 10 0 1
1 10 11 1 2
2 10 12 2 3
If we want to do a groupby now, we first need to stack, groupby and stack again
df_concat0.stack().groupby(level=[0,1]).mean().unstack()
0 1
0 5.0 5.5
1 5.5 6.5
2 6.0 7.5
If we do
df_concat = pd.concat((df1, df2))
This puts all the dataframes on top of eachother
0 1
0 10 10
1 10 11
2 10 12
0 0 1
1 1 2
2 2 3
now we need to just groupby the index, like you did
df_concat.groupby(level=0).mean()
0 1
0 5.0 5.5
1 5.5 6.5
2 6.0 7.5
and then use ExcelWriter
as context manager
with pd.ExcelWriter(filepath, engine='openpyxl') as writer:
result.to_excel(writer)
or just plain
result.to_excel(filepath, engine='openpyxl')
if you can overwrite what is is filepath