Search code examples
pythonpandasdateyahoo-finance

"Date" column cannot be read by df_dates = df.loc[:,'Date'] (Data coming from yahoo Fin)


I am currently getting data from Yahoo Finance and for my code purpose i need to loc Date and Open variables.

df_dates = df.loc[:,'Date'] # Get all of the rows from the Date column
df_open = df.loc[:,'Open'] #Get all of the rows from the Open column

However, as I understand Date doesn't seem like a Column but we can see there is Column Called "Date"

I am pulling update from yahoo like this;

df = pdr.get_data_yahoo("AMD", start ='2019-10-01')

and after i execute ;

df_dates = df.loc[:,'Date'] # Get all of the rows from the Date column
df_open = df.loc[:,'Open'] #Get all of the rows from the Open column

This is the error I get.

--------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Date'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
8 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Date'

How can I locate the column "Date"?

Best,


Solution

  • from pandas_datareader import data as pdr
    df = pdr.get_data_yahoo("AMD", start ='2019-10-01')
    

    Checking the columns and index:

    df.columns
    

    Output:

    Index(['High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'], dtype='object')
    
    df.index
    

    Output

    DatetimeIndex(['2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
                   '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
                   '2019-10-11', '2019-10-14', '2019-10-15', '2019-10-16',
                   '2019-10-17', '2019-10-18', '2019-10-21', '2019-10-22',
                   '2019-10-23', '2019-10-24', '2019-10-25', '2019-10-28',
                   '2019-10-29', '2019-10-30', '2019-10-31', '2019-11-01',
                   '2019-11-04', '2019-11-05', '2019-11-06', '2019-11-07',
                   '2019-11-08'],
                  dtype='datetime64[ns]', name='Date', freq=None)
    

    As you can see Date belongs to the index and therefore you cannot select it by its label. Using reset_index directly on the read data frame may not be convenient because the rest of the columns no longer have the Date index. Use Index.to_frame and then DataFrame.reset_index

    df_Open=df[['Open']] #this returns a DataFrame, you don't need loc
    df_dates=df.index.to_frame().reset_index(drop=True)
    print(df_dates)
    
             Date
    0  2019-10-01
    1  2019-10-02
    2  2019-10-03
    3  2019-10-04
    4  2019-10-07
    5  2019-10-08
    6  2019-10-09
    7  2019-10-10
    8  2019-10-11
    9  2019-10-14
    10 2019-10-15
    11 2019-10-16
    12 2019-10-17
    13 2019-10-18
    14 2019-10-21
    15 2019-10-22
    16 2019-10-23
    17 2019-10-24
    18 2019-10-25
    19 2019-10-28
    20 2019-10-29
    21 2019-10-30
    22 2019-10-31
    23 2019-11-01
    24 2019-11-04
    25 2019-11-05
    26 2019-11-06
    27 2019-11-07
    28 2019-11-08