Search code examples
pythonpython-3.xpandasdatetimepandas-loc

Error when .loc() rows with a list of dates in pandas


I have the following code:

import pandas as pd
from pandas_datareader import data as web

df = web.DataReader('^GSPC', 'yahoo')
df['pct'] = df['Close'].pct_change()

dates_list = df.index[df['pct'].gt(0.002)]

df2 = web.DataReader('^GDAXI', 'yahoo')
df2['pct2'] = df2['Close'].pct_change()

i was trying to run this:

df2.loc[dates_list, 'pct2']

But i keep getting this error:

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported,

I am guessing this is because there are missing data for dates in dates_list. To resolve this:

    idx1 = df.index
    idx2 = df2.index
    missing = idx2.difference(idx1)
    df.drop(missing, inplace = True)
    df2.drop(missing, inplace = True)

However i am still getting the same error. I dont understand why that is.


Solution

  • Note that dates_list has been created from df, so it includes some dates present in index there (in df).

    Then you read df2 and attempt to retrieve pct2 from rows on just these dates.

    But there is a chance that the index in df2 does not contain all dates given in dates_list. And just this is the cause of your exception.

    To avoid it, retrieve only rows on dates present in the index. To look for only such "allowed" (narrow down the rows specifidation), you should pass:

    dates_list[dates_list.isin(df2.index)]
    

    Run this alone and you will see the "allowed" dates (some dates will be eliminated).

    So change the offending instruction to:

    df2.loc[dates_list[dates_list.isin(df2.index)], 'pct']