Making new pandas multiindex df from original multiindex

I have a multiindex dataframe, df1, pulled from yahoo using yfinance, with a timeseries index and two-level multiindex columns. Level 0 is 'adj close', 'high' and 'volume', and level 1 is a list of company symbols, like this:

	Adj Close			High			Volume
	C1	C2	C3	C1	C2	C3	C1	C2	C3
Date
02-01-2020	12.78	41.73	24.03	13.50	41.77	26.43	100000	1234300	23454
03-01-2020	12.29	41.11	23.61	12.77	42.09	23.99	100022	1555555	23999

Im trying to make a new dataframe, df2, by making calculations on the columns of df1.

For example, using the same timeseries index, Id like to make the first column in the new dataframe, df2, the percent increase for each of the (level 1) symbols in the (level 0) price column of df1. Then columns in df2 being rolling moving average of the Price (level 0) each symbol (level 1). Also a column that writes a boolean True if the High of today is higher than the Adj Close of yesterday.

Basically, each column in df2 will be made by manipulating level 0 data from the corresponding level 1 company symbols in df1.

(Im saying column I now know its not a column, but a group of columns grouped by the level 0)

So I wrote a function:

def indicators_df(df1):
    
    # Create a new DataFrame with the same index as df1
    df2 = pd.DataFrame(index=df1.index)
           
    # Plus / minus change %
    df2['Pct'] = df1['Adj Close'].pct_change().fillna(0)
    
    # Price MAs
    df2['$10MA'] = df1['Adj Close'].rolling(window=10).mean()
    df2['$20MA'] = df1['Adj Close'].rolling(window=20).mean()
    df2['$50MA'] = df1['Adj Close'].rolling(window=50).mean()
    df2['$100MA'] = df1['Adj Close'].rolling(window=100).mean()
    df2['$200MA'] = df1['Adj Close'].rolling(window=200).mean()
    
    # Volume MAs
    df2['V10MA'] = df1['Volume'].rolling(window=10).mean()
    df2['V20MA'] = df1['Volume'].rolling(window=20).mean()
    df2['V50MA'] = df1['Volume'].rolling(window=50).mean()
    df2['V100MA'] = df1['Volume'].rolling(window=100).mean()
    df2['V200MA'] = df1['Volume'].rolling(window=200).mean()
    
    return df2

which returns this error:

ValueError: Cannot set a DataFrame with multiple columns to the single column Pct

I tried loads of variations, but kept getting the same error until I realised (I think...) Im trying to put the results for all the level 1 company symbols in one column in the new dataframe, df2, which is not multiindex.

So how would I write the function correctly, creating a new multiindex dataframe where the level 0 values/keys (not sure of the correct terminology) are the calculations Ive made on df1, such as percent change, rolling average etc?

Thanks.

Solution

IIUC, as a simple approach, you can concat multiple xs on a predefined pair of computations :

def indicators_df(df):
    def xs(df, m):
        return df.xs(m, axis=1, drop_level=False)

    def rn(df, d):
        return df.rename(d, axis=1, level=0)

    pct = (df.pipe(xs, "Adj Close").pct_change()
           .fillna(0).pipe(rn, {"Adj Close": "Pct"}))

    tmp = df.pipe(xs, "High")
    chk = tmp.eq(tmp.shift()).pipe(rn, {"High": "Check"})
    
    _map = {k: [(w, f"{pre}{w}MA") for w in [10, 20, 50, 100, 200]]
            for k, pre in [("Adj Close", "$"), ("Volume", "V")]}

    pvs = [df.pipe(xs, l0).rolling(b).mean().pipe(rn, {l0:l1})
           for (l0, p) in _map.items() for (b,l1) in p]

    return pd.concat([pct, chk, *pvs], axis=1)

Output :

print(indicators_df(df))

             Pct              Check         ... V100MA     V200MA        
              C1    C2    C3     C1     C2  ...     C2  C3     C1  C2  C3
Date                                        ...                          
02-01-2020  0.00  0.00  0.00  False  False  ...    NaN NaN    NaN NaN NaN
03-01-2020 -0.04 -0.01 -0.02  False  False  ...    NaN NaN    NaN NaN NaN

[2 rows x 36 columns]

Used input :

df = pd.DataFrame.from_dict(
    {'index': ['02-01-2020', '03-01-2020'],
     'columns': [
         ('Adj Close', 'C1'), ('Adj Close', 'C2'), ('Adj Close', 'C3'),
         ('High', 'C1'), ('High', 'C2'), ('High', 'C3'), ('Volume', 'C1'),
         ('Volume', 'C2'),('Volume', 'C3')],
     'data': [[12.78, 41.73, 24.03, 13.5, 41.77,
               26.43, 100000, 1234300, 23454],
              [12.29, 41.11, 23.61, 12.77, 42.09,
               23.99, 100022, 1555555, 23999]],
     'index_names': ['Date'],
     'column_names': [None, None]}, orient='tight')