Search code examples
pythonpandasdataframemulti-index

Adding an Average Column to a Pandas Multiindex Dataframe


I have a dataframe df

first        bar                 baz           
second       one       two       one       two 
A       0.487880 -0.487661 -1.030176  0.100813 
B       0.267913  1.918923  0.132791  0.178503
C       1.550526 -0.312235 -1.177689 -0.081596 

I'd like to add a average columns and then move the average to the front

df['Average'] = df.mean(level='second', axis='columns')  #ERROR HERE
cols = df.columns.tolist()
df = df[[cols[-1]] + cols[:-1]]

I get the error:

ValueError: Wrong number of items passed 2, placement implies 1

Maybe, I could add each column df['Average', 'One'] = ... in the mean one at a time but that seems silly especially as the real life index is more complicated.

Edit: (Frame Generation)

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
    ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)

Solution

  • I'm not sure on your target output. Something like this?

    df2 = df.mean(level='second', axis='columns')
    df2.columns = pd.MultiIndex.from_tuples([('mean', col) for col in df2])
    >>> df2
           mean          
            one       two
    A -0.271148 -0.193424
    B  0.200352  1.048713
    C  0.186419 -0.196915
    
    >>> pd.concat([df2, df], axis=1)
           mean                 bar                 baz          
            one       two       one       two       one       two
    A -0.271148 -0.193424  0.487880 -0.487661 -1.030176  0.100813
    B  0.200352  1.048713  0.267913  1.918923  0.132791  0.178503
    C  0.186419 -0.196915  1.550526 -0.312235 -1.177689 -0.081596
    

    You are getting the error because your mean operation results in a dataframe (with two columns in this case). You are then trying to assign this result into one column in the original dataframe.