Compute Average/Mean across Dataframes in Python Pandas

I have a list of dataframes. Each dataframe was originally numerical data taken from which are all shaped identically with 21 rows and 5 columns. The first column is an index (index 0 to index 20). I want to compute the average (mean) values into a single dataframe. Then I want to export the dataframe to excel.

Here's a simplified version of my existing code:

#look to concatenate the dataframes together all at once
#dataFrameList is the given list of dataFrames
concatenatedDataframes = pd.concat(dataFrameList, axis = 1)

#grouping the dataframes by the index, which is the same across all of the dataframes
groupedByIndex = concatenatedDataframes.groupby(level = 0)

#take the mean 
meanDataFrame = groupedByIndex.mean()

# Create a Pandas Excel writer using openpyxl as the engine.
writer = pd.ExcelWriter(filepath, engine='openpyxl')
meanDataFrame.to_excel(writer)

However, when I open the excel file, I see what looks like EVERY dataframe is copied into the sheet and the average/mean values are not shown. A simplified example is shown below (cutting most of the rows and dataframes)

              Dataframe 1                   Dataframe 2                   Dataframe 3
Index  Col2   Col3   Col4   Col5     Col2   Col3   Col4   Col5     Col2   Col3   Col4   Col5
0      Data   Data   Data   Data     Data   Data   Data   Data     Data   Data   Data   Data
1      Data   Data   Data   Data     Data   Data   Data   Data     Data   Data   Data   Data
2      Data   Data   Data   Data     Data   Data   Data   Data     Data   Data   Data   Data
....

I'm looking for something more like:

           Averaged DF
Index  Col2                                   Col3                                   Col4
0      Mean Index0,Col2 across DFs    Mean Index0,Col3 across DFs    Mean Index0,Col4 across DFs
1      Mean Index1,Col2 across DFs    Mean Index1,Col3 across DFs    Mean Index1,Col4 across DFs
2      Mean Index2,Col2 across DFs    Mean Index2,Col3 across DFs    Mean Index3,Col4 across DFs
...

I have also already seen this answer: Get the mean across multiple Pandas DataFrames

If possible, I'm looking for a clean solution, not one which would simply involve looping through each dataFrame value by value. Any suggestions?

Solution

Perhaps I misunderstood what you asked

The solution is simple. You just need to concat along the correct axis

dummy data

df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[10 + i * j for j in range(columns)] for i in range(rows) ])
df2 = df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[i + j for j in range(columns)] for i in range(rows) ])

ps. this should be your job as OP

pd.concat

df_concat0 = pd.concat((df1, df2), axis=1)

puts all the dataframes next to eachother.

    0   1   0   1
0   10  10  0   1
1   10  11  1   2
2   10  12  2   3

If we want to do a groupby now, we first need to stack, groupby and stack again

df_concat0.stack().groupby(level=[0,1]).mean().unstack()

    0   1
0   5.0     5.5
1   5.5     6.5
2   6.0     7.5

If we do

df_concat = pd.concat((df1, df2))

This puts all the dataframes on top of eachother

now we need to just groupby the index, like you did

df_concat.groupby(level=0).mean()

    0   1
0   5.0     5.5
1   5.5     6.5
2   6.0     7.5

and then use ExcelWriter as context manager

with pd.ExcelWriter(filepath, engine='openpyxl') as writer:
    result.to_excel(writer)

or just plain

result.to_excel(filepath, engine='openpyxl')

if you can overwrite what is is filepath