Search code examples
pythonpython-3.xpandasmulti-index

Interacting with Multiindex Pandas in Python


I have the following multiindex dataframe:

df
Out[44]: 
Attributes     Adj Close                            ...      Volume             new
Symbols    ADANIPORTS.NS ASIANPAINT.NS AXISBANK.NS  ...    WIPRO.NS     ZEEL.NS    
Date                                                ...                            
2015-06-12    304.541199    678.997864  536.043945  ...   5907771.0    779489.0 NaN
2015-06-15    302.589355    672.165955  534.683044  ...   2754255.0   1515607.0 NaN
2015-06-16    305.663574    678.324219  535.752380  ...   3010103.0   2013937.0 NaN
2015-06-17    303.516510    683.809143  533.905396  ...   3640391.0   2795132.0 NaN
2015-06-18    303.516510    701.417847  534.683044  ...   4764172.0   1354886.0 NaN
                 ...           ...         ...  ...         ...         ...  ..
2020-06-04    339.200012   1633.750000  394.350006  ...   7397468.0  47567706.0 NaN
2020-06-05    341.149994   1638.550049  405.299988  ...   7245584.0  36525490.0 NaN
2020-06-08    343.299988   1634.250000  430.250000  ...  12213696.0  27555715.0 NaN
2020-06-09    346.049988   1642.050049  420.049988  ...   7845252.0  27072444.0 NaN
2020-06-10    346.549988   1635.949951  427.450012  ...   7391699.0  17115912.0 NaN

[1227 rows x 301 columns]

So when i run

df.columns.levels
Out[45]: FrozenList([['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume', 'new'], ['ADANIPORTS.NS', 'ASIANPAINT.NS', 'AXISBANK.NS', 'BAJAJ-AUTO.NS', 'BAJFINANCE.NS', 'BAJAJFINSV.NS', 'BHARTIARTL.NS', 'INFRATEL.NS', 'BPCL.NS', 'BRITANNIA.NS', 'CIPLA.NS', 'COALINDIA.NS', 'DRREDDY.NS', 'EICHERMOT.NS', 'GAIL.NS', 'GRASIM.NS', 'HCLTECH.NS', 'HDFC.NS', 'HDFCBANK.NS', 'HEROMOTOCO.NS', 'HINDALCO.NS', 'HINDUNILVR.NS', 'ICICIBANK.NS', 'INDUSINDBK.NS', 'INFY.NS', 'IOC.NS', 'ITC.NS', 'JSWSTEEL.NS', 'KOTAKBANK.NS', 'LT.NS', 'M&M.NS', 'MARUTI.NS', 'NESTLEIND.NS', 'NTPC.NS', 'ONGC.NS', 'POWERGRID.NS', 'RELIANCE.NS', 'SHREECEM.NS', 'SBIN.NS', 'SUNPHARMA.NS', 'TCS.NS', 'TATAMOTORS.NS', 'TATASTEEL.NS', 'TECHM.NS', 'TITAN.NS', 'ULTRACEMCO.NS', 'UPL.NS', 'VEDL.NS', 'WIPRO.NS', 'ZEEL.NS', '']])

Finally when i run:

df['Close']
Out[46]: 
Symbols     ADANIPORTS.NS  ASIANPAINT.NS  ...    WIPRO.NS     ZEEL.NS
Date                                      ...                        
2015-06-12     312.049988     705.650024  ...  201.505997  333.700012
2015-06-15     310.049988     698.549988  ...  202.837997  339.250000
2015-06-16     313.200012     704.950012  ...  202.613007  345.000000
2015-06-17     311.000000     710.650024  ...  204.955994  351.100006
2015-06-18     311.000000     728.950012  ...  208.537994  347.000000
                  ...            ...  ...         ...         ...
2020-06-04     339.200012    1633.750000  ...  216.250000  205.600006
2020-06-05     341.149994    1638.550049  ...  218.750000  206.199997
2020-06-08     343.299988    1634.250000  ...  226.449997  197.050003
2020-06-09     346.049988    1642.050049  ...  218.350006  192.399994
2020-06-10     346.549988    1635.949951  ...  217.000000  189.800003

So far so good. My quesiton is when i run the following command:

df['new'] =df['Close'].pct_change()

i get this:

df['new']
Out[40]: 
Date
2015-06-12   NaN
2015-06-15   NaN
2015-06-16   NaN
2015-06-17   NaN
2015-06-18   NaN
              ..
2020-06-04   NaN
2020-06-05   NaN
2020-06-08   NaN
2020-06-09   NaN
2020-06-10   NaN
Name: new, Length: 1227, dtype: float64

Why am i getting NaN? This is very odd because when i run:

df['Close'].pct_change()
Out[50]: 
Symbols     ADANIPORTS.NS  ASIANPAINT.NS  ...  WIPRO.NS   ZEEL.NS
Date                                      ...                    
2015-06-12            NaN            NaN  ...       NaN       NaN
2015-06-15      -0.006409      -0.010062  ...  0.006610  0.016632
2015-06-16       0.010160       0.009162  ... -0.001109  0.016949
2015-06-17      -0.007024       0.008086  ...  0.011564  0.017681
2015-06-18       0.000000       0.025751  ...  0.017477 -0.011678
                  ...            ...  ...       ...       ...
2020-06-04      -0.001031      -0.048236  ...  0.023184  0.050588
2020-06-05       0.005749       0.002938  ...  0.011561  0.002918
2020-06-08       0.006302      -0.002624  ...  0.035200 -0.044374
2020-06-09       0.008010       0.004773  ... -0.035769 -0.023598
2020-06-10       0.001445      -0.003715  ... -0.006183 -0.013513

[1227 rows x 50 columns]

What can i do to get pct_change column? What i am expecting is to get a new column which will calculate the pct_change of the closing price of each ticker. How can i do this?

enter image description here


Solution

  • You can add a new level with Close percent changes for all tickers using pd.concat(), ie.:

    df = pd.concat([df, df[['Close']].pct_change().rename(columns={'Close': 'Close_pct'})], axis=1)
    

    Note the double brackets: df[['Close']] - this preserves the column levels of the slice.