Search code examples
pythoniterationmulti-indexbloomberg

Adding columns to MultiIndex dataframe via iteration


I have a MultiIndex problem. I use a 3rd party package that hands me stock prices and PE ratios in MultiIndex format. What I'm trying to do is iteratively add two new columns for each ticker, which calculates the mean and standard dev for the PE ratio.

The rough data structure can be replicated with this code:

arrays = [['GOOGL US Equity','GOOGL US Equity','IBM US Equity','IBM US Equity'],['LAST_PRICE','BEST_PE_RATIO']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['ticker','field'])

df = pd.DataFrame(np.random.randn(4,4), 
                  columns=columns, 
                  index=pd.date_range('20160103',periods=4))

The output resembles the following:

df

I've attempted to do this ad hoc with the code below, but keep getting handfuls of keyerrors and 'slice' errors...

df['GOOGL US Equity']['1 STD DEV'] = df['GOOGL US Equity']['BEST_PE_RATIO'].std()

Is there a more pythonic solution that creates the new needed columns and iterates thru the tickers?


Solution

  • When setting values to dataframe via chained indexing, sometimes it fails and the result is hard to predict. See this:Returning a view versus a copy in Pandas. I think df[('GOOGL US Equity','1 STD DEV')] = df['GOOGL US Equity']['BEST_PE_RATIO'].std() is the solution.

    If you have many tickers, then do it like this:

    for ticker in df.columns.levels[0]:
        df[(ticker, '1 STD DEV')] = df[ticker]['BEST_PE_RATIO'].std()
    

    Besides, if you want to resort the columns, use df.sort_index(axis=1).