Search code examples
python-2.7pandaskeyerror

KeyError: Not in index, using a keys generated from a Pandas dataframe on itself


I have two columns in a Pandas DataFrame that has datetime as its index. The two column contain data measuring the same parameter but neither column is complete (some row have no data at all, some rows have data in both column and other data on in column 'a' or 'b').

I've written the following code to find gaps in columns, generate a list of indices of dates where these gaps appear and use this list to find and replace missing data. However I get a KeyError: Not in index on line 3, which I don't understand because the keys I'm using to index came from the DataFrame itself. Could somebody explain why this is happening and what I can do to fix it? Here's the code:

def merge_func(df):
    null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
    df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']

    df.insert(len(df.columns), 'Mean_mg/L', 0.0)
    df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
    return df

merge_func(sve)

Solution

  • Whenever you are considering performing assignment then you should use .loc:

    df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
    

    The error in your original code is the ordering of the subscript values for the index lookup:

    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    

    will produce an index error, I get the error on a toy dataset: IndexError: indices are out-of-bounds

    If you changed the order to this it would probably work:

    df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index]
    

    However, this is chained assignment and should be avoided, see the online docs

    So you should use loc:

    df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
    df.loc[notnull_index, 'DOC_mg/L'] = df['TOC_mg/L']
    

    note that it is not necessary to use the same index for the rhs as it will align correctly