Search code examples
pythonpandasdataframeyahoo-finance

Setting yahoo finance date as dataframe index


i have a python function that stores the yahoo finance data to a dataframe.

from pandas.io.data import DataReader
bars= DataReader(symbol, "yahoo",hist_date, today)

and i get the result returned to bars as follows

DataFrame:

                    Open  High   Low  Close  Volume  Adj Close\nDate                                                  
                   \n2011-01-12  2.00  2.00  2.00   2.00     100   1.891661
                   \n2011-01-13  2.00  2.00  1.92   2.00    6800   1.891661
                   \n2011-01-14  1.84  2.24  1.84   2.19    1500   2.071369
                   \n2011-01-18  2.25  2.25  2.02   2.02    4300   1.910578
                   \n2011-01-19  2.07  2.12  2.07   2.12    3400   2.005161
                   \n2011-01-20  2.21  2.21  2.10   2.17    5000   2.052452
                   \n2011-01-21  2.25  2.25  2.20   2.20     600   2.080827
                   \n2011-01-24  2.20  2.20  2.12   2.18    2300   2.061911 

now i want to make the date coloumn as the index field of the dataframe. also when i try to display the dataframe in a table i cant get the date field displayed anywhere. is it because there is a \n before the coloumn heading and coloumn data.


Solution

  • First, let me retrieve 5 days of historical data for Google from Yahoo:

    from pandas.io.data import DataReader
    import datetime as dt
    
    today = dt.datetime.today().strftime('%Y-%m-%d')
    hist = (dt.datetime.today()-dt.timedelta(7)).strftime('%Y-%m-%d')
    
    df = DataReader('GOOG', 'yahoo', hist, today)
    
    df
    
                      Open        High        Low       Close   Volume  Adj Close
    Date                        
    2016-01-05  746.450012  752.000000  738.640015  742.580017  1947700 742.580017
    2016-01-06  730.000000  747.179993  728.919983  743.619995  1938600 743.619995
    2016-01-07  730.309998  738.500000  719.059998  726.390015  2944300 726.390015
    2016-01-08  731.450012  733.229980  713.000000  714.469971  2442600 714.469971
    

    which shows me 6 columns for data: OHLC, Volume and Adjusted Close, and one column for index called Date

    To ascertain that data frame is indeed indexed by date you may even try:

    df.index
    DatetimeIndex(['2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08'], dtype='datetime64[ns]', name='Date', freq=None)
    

    Does this short exercise address your concern?

    PS
    The ordering of column names and Date after Adjusted Close appears wierd to me...