Search code examples
pythonpandaspandas-datareader

How to reshape a dataframe with added attributes


I have a Pandas DataFrame of stock closing prices from a list of tickers which I pulled from pandas_datareader.

tickers = ['FB', 'AAPL', 'AMZN', 'RSP']
stocks = web.DataReader(tickers,'yahoo',start_date,end_date)
stocks

Currently my DataFrame looks like this:

enter image description here

I tried to set the date and symbols as my index like so `stocks.set_index(['Date','Symbols]) but I get this error:

KeyError: "None of ['Date', 'Symbols'] are in the columns"

How do I reshape the DataFrame so my columns are in this order: Date, Symbols, Close


Solution

  • from pandas_datareader import data

    tickers = ['FB', 'AAPL', 'AMZN', 'RSP']
    start_date,end_date = "2020-04-01", "2020-04-10"
    stocks = data.DataReader(tickers,'yahoo',start_date,end_date)
    stocks_close = stocks['Close']
    
    s1 = stocks_close.stack(0) # multi-index pandas dataframe
    s2 = s1.reset_index() # no index, flatted and repeated
    

    s1 give :

    Date        Symbols
    2020-04-01  AAPL        240.910004
                AMZN       1907.699951
                FB          159.600006
                RSP          79.879997
    2020-04-02  AAPL        244.929993
                AMZN       1918.829956
                FB          158.190002
                RSP          81.139999
    2020-04-03  AAPL        241.410004
                AMZN       1906.589966
                FB          154.179993
                RSP          79.830002
    2020-04-06  AAPL        262.470001
                AMZN       1997.589966
                FB          165.550003
                RSP          85.870003
    2020-04-07  AAPL        259.429993
                AMZN       2011.599976
                FB          168.830002
                RSP          86.559998
    2020-04-08  AAPL        266.070007
                AMZN       2043.000000
                FB          174.279999
                RSP          90.139999
    2020-04-09  AAPL        267.989990
                AMZN       2042.760010
                FB          175.190002
                RSP          92.220001
    

    s2 :

             Date Symbols            0
    0  2020-04-01    AAPL   240.910004
    1  2020-04-01    AMZN  1907.699951
    2  2020-04-01      FB   159.600006
    3  2020-04-01     RSP    79.879997
    4  2020-04-02    AAPL   244.929993
    5  2020-04-02    AMZN  1918.829956
    6  2020-04-02      FB   158.190002
    7  2020-04-02     RSP    81.139999
    8  2020-04-03    AAPL   241.410004
    9  2020-04-03    AMZN  1906.589966
    10 2020-04-03      FB   154.179993
    11 2020-04-03     RSP    79.830002
    12 2020-04-06    AAPL   262.470001
    13 2020-04-06    AMZN  1997.589966
    14 2020-04-06      FB   165.550003
    15 2020-04-06     RSP    85.870003
    16 2020-04-07    AAPL   259.429993
    17 2020-04-07    AMZN  2011.599976
    18 2020-04-07      FB   168.830002
    19 2020-04-07     RSP    86.559998
    20 2020-04-08    AAPL   266.070007
    21 2020-04-08    AMZN  2043.000000
    22 2020-04-08      FB   174.279999
    23 2020-04-08     RSP    90.139999
    24 2020-04-09    AAPL   267.989990
    25 2020-04-09    AMZN  2042.760010
    26 2020-04-09      FB   175.190002
    27 2020-04-09     RSP    92.220001
    

    if u want rename the third column of s2:

    s2.columns = ['Dates', 'Symbols', 'Close']