Search code examples
pandasdataframecomparemulti-index

Comparing columns in a multiindex dataframe, creating new df with result


Lets say I have this df:

import pandas as pd
import numpy as np

# Sample MultiIndex DataFrame
index = pd.date_range('2023-01-01', '2023-01-05', freq='D')
codes = ['A', 'B', 'C']
columns = pd.MultiIndex.from_product([['MA1', 'MA2', 'MA3', 'MA4', 'MA5'], codes], names=['MA', 'code'])

data = np.random.randint(1, 10, size=(len(index), len(columns)))
df = pd.DataFrame(data, index=index, columns=columns)
df

The index is datetime and on the multiindex columns, Level 0 is made up of 5 different moving averages (MA1 - MA5) and level 1 is made up of 3 codes (A, B and C). (The real df is far bigger)

Im trying to compare A, B and C's values of MA1 with each of MA2, MA3, MA4 and MA5, and create a new_df.

The new_df will have the same datetime index, and the same level 1 of the multiindex columns (A, B and C), but Level 0 will be different: 'MA1>MA2', 'MA1>MA3', 'MA1>MA4', 'MA1>MA5'.

The new_df will be made up of 1s, -1s and 0s. A 1 if value >, -1 if <, and 0 if same.

So, for each date, the first Level 0 column of new_df ('MA1>MA2') will compare the value of A B and C's MA1 values with their MA2 values, resulting in a 1, -1 or 0.

The second level 0 column of new_df ('MA1>MA3') will compare A, B and c's MA1 values with the MA3 values, etc

Im stuck with invalid index errors, trying to get the right slices to compare... can anyone help?

Thanks!


Solution

  • You can subtract everything in the dataframe from MA1, using np.sign to generate your desired 1/0/-1 result, and then rename the columns if desired:

    out = np.sign(df.loc[:,'MA1'] - df).drop(columns='MA1')
    out.columns = pd.MultiIndex.from_tuples([(f'MA1>{a}',b) for a,b in out.columns], names=out.columns.names)
    

    For my sample data of

    MA         MA1       MA2       MA3       MA4       MA5
    code         A  B  C   A  B  C   A  B  C   A  B  C   A  B  C
    2023-01-01   5  6  5   4  6  8   7  3  8   9  8  2   3  1  6
    2023-01-02   9  7  3   5  3  7   4  9  5   6  8  7   9  3  4
    2023-01-03   2  9  4   9  2  5   6  3  4   3  7  3   4  1  8
    2023-01-04   7  1  7   9  1  1   6  1  7   1  8  3   2  8  7
    2023-01-05   1  6  2   1  8  2   4  9  4   9  9  7   3  1  4
    

    This gives:

    MA         MA1>MA2       MA1>MA3       MA1>MA4       MA1>MA5
    code             A  B  C       A  B  C       A  B  C       A  B  C
    2023-01-01       1  0 -1      -1  1 -1      -1 -1  1       1  1 -1
    2023-01-02       1  1 -1       1 -1 -1       1 -1 -1       0  1 -1
    2023-01-03      -1  1 -1      -1  1  0      -1  1  1      -1  1 -1
    2023-01-04      -1  0  1       1  0  0       1 -1  1       1 -1  0
    2023-01-05       0 -1  0      -1 -1 -1      -1 -1 -1      -1  1 -1