Search code examples
pythonpandasdataframemulti-index

pandas.MultiIndex: assign all elements in first level


I have a dataframe with a multiindex, as per the following example:

dates = pandas.date_range(datetime.date(2020,1,1), datetime.date(2020,1,4))
columns = ['a', 'b', 'c']
index = pandas.MultiIndex.from_product([dates,columns])
panel = pandas.DataFrame(index=index, columns=columns)

This gives me a dataframe like this:

                a   b   c
2020-01-01  a   NaN NaN NaN
            b   NaN NaN NaN
            c   NaN NaN NaN
2020-01-02  a   NaN NaN NaN
            b   NaN NaN NaN
            c   NaN NaN NaN
2020-01-03  a   NaN NaN NaN
            b   NaN NaN NaN
            c   NaN NaN NaN
2020-01-04  a   NaN NaN NaN
            b   NaN NaN NaN
            c   NaN NaN NaN

I have another 2-dimensional dataframe, as follows:

df = pandas.DataFrame(index=dates, columns=columns, data=numpy.random.rand(len(dates), len(columns)))

Resulting in the following:

            a           b           c
2020-01-01  0.540867    0.426181    0.220182
2020-01-02  0.864340    0.432873    0.487878
2020-01-03  0.017099    0.181050    0.373139
2020-01-04  0.764557    0.097839    0.499788

I would like to assign to the [a,a] cell, across all dates, and the [a,b] cell, across all dates etc.

Something akin to the following:

for i in df.columns:
    for j in df.columns:
        panel.xs(i, level=1).loc[j] = df[i] * df[j]

Of course this doesn't work, because I am attempting to set a value on a copy of a slice

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

I tried several variations:

panel.loc[:,'a']         # selects all rows, and column a
panel.loc[(:, 'a'), 'a'] # invalid syntax
etc...

How can I select index level 1 (eg: row 'a'), column 'a', across all index level 0 - and be able to set the values?


Solution

  • Try broadcasing on the values:

    a = df.to_numpy()
    
    panel = pd.DataFrame((a[...,None] * a[:,None,:]).reshape(-1, df.shape[1]), 
                         index=panel.index, columns=panel.columns)
    

    Output:

                         a         b         c
    2020-01-01 a  0.292537  0.230507  0.119089
               b  0.230507  0.181630  0.093837
               c  0.119089  0.093837  0.048480
    2020-01-02 a  0.747084  0.374149  0.421692
               b  0.374149  0.187379  0.211189
               c  0.421692  0.211189  0.238025
    2020-01-03 a  0.000292  0.003096  0.006380
               b  0.003096  0.032779  0.067557
               c  0.006380  0.067557  0.139233
    2020-01-04 a  0.584547  0.074803  0.382116
               b  0.074803  0.009572  0.048899
               c  0.382116  0.048899  0.249788