Search code examples
pandaspandas-datareader

pandas merge two dataframe to form a multiindex


I'm playing around with Pandas to see if I can do some stock calculation better/faster than with other tools. If I have a single stock it's easy to create daily calculation L

df['mystuff'] = df['Close']+1

If I download more than a ticker it gets complicated:

df = df.stack() 
df['mystuff'] = df['Close']+1
df = df.unstack()

If I want to use prevous' day "Close" it gets too complex for me. I thought I might go back to fetch a single ticker, do any operation with iloc[i-1] or something similar (I haven't figured it yet) and then merge the dataframes.

How do I merget two dataframes of single tickers to have a multiindex? So that:

f1 = web.DataReader('AAPL', 'yahoo', start, end)
f2 = web.DataReader('GOOG', 'yahoo', start, end)

is like

f = web.DataReader(['AAPL','GOOG'], 'yahoo', start, end)

Edit: This is the nearest thing to f I can create. It's not exactly the same so I'm not sure I can use it instead of f.

f_f = pd.concat(['AAPL':f1,'GOOG':f2},axis=1)

Maybe I should experiment with operations working on a multiindex instead of splitting work on simpler dataframes.

Full Code:

import pandas_datareader.data as web
import pandas as pd
from datetime import datetime

start = datetime(2001, 9, 1)
end = datetime(2019, 8, 31)
a = web.DataReader('AAPL', 'yahoo', start, end)
g = web.DataReader('GOOG', 'yahoo', start, end)
# here are shift/diff calculations that I don't knokw how to do with a multiindex
a_g = web.DataReader(['AAPL','GOOG'], 'yahoo', start, end)
merged = pd.concat({'AAPL':a,'GOOG':g},axis=1)

a_g.to_csv('ag.csv')
merged.to_csv('merged.csv')
import code; code.interact(local=locals())

side note: I don't know how to compare the two csv


Solution

  • This is not exactly the same but it returns Multiindex you can use as in the a_g case

    import pandas_datareader.data as web
    import pandas as pd
    from datetime import datetime
    
    start = datetime(2019, 7, 1)
    end = datetime(2019, 8, 31)
    out = []
    for tick in ["AAPL", "GOOG"]:
        d = web.DataReader(tick, 'yahoo', start, end)
        cols = [(col, tick) for col in d.columns]
        d.columns = pd.MultiIndex\
                      .from_tuples(cols,
                                   names=['Attributes', 'Symbols'] )
        out.append(d)
    
    df = pd.concat(out, axis=1)
    

    Update

    In case you want to calculate and add a new column in case you have multiindex columns you can follow this

    import pandas_datareader.data as web
    import pandas as pd
    from datetime import datetime
    
    start = datetime(2019, 7, 1)
    end = datetime(2019, 8, 31)
    
    ticks = ['AAPL','GOOG']
    df = web.DataReader(ticks, 'yahoo', start, end)
    names = list(df.columns.names)
    
    df1 = df["Close"].shift() 
    cols = [("New", col) for col in df1.columns]
    df1.columns = pd.MultiIndex.from_tuples(cols,
                                            names=names)
    
    df = df.join(df1)