Search code examples
pythonkagglekeyerror

What do KeyErrors means and how can I resolve them?


I am very new to Python and coding and I am working on a predictive model for a Kaggle Prediction Competition. I am trying to write code to delete a certain variable that I deemed nonimportant for predicting the survivability of the sinking of the Titanic (the Kaggle Competition prompt). FYI, 'Cabin' is a defined term because it is a variable and a part of the information given.

My code is:

import re
deck = {"A": 1, "B": 2, "C": 3, "D": 4, "E": 5, "F": 6, "G": 7, "U": 8}
data = [train_df, test_df]

for dataset in data:
    dataset['Cabin'] = dataset['Cabin'].fillna("U0")
    dataset['Deck'] = dataset['Cabin'].map(lambda x: re.compile("([a-zA-Z]+)").search(x).group())
    dataset['Deck'] = dataset['Deck'].map(deck)
    dataset['Deck'] = dataset['Deck'].fillna(0)
    dataset['Deck'] = dataset['Deck'].astype(int)

train_df = train_df.drop(['Cabin'], axis=1)
test_df = test_df.drop(['Cabin'], axis=1)

And the error I received was:


KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Cabin'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-52-b7c547f14770> in <module>
      4 
      5 for dataset in data:
----> 6     dataset['Cabin'] = dataset['Cabin'].fillna("U0")
      7     dataset['Deck'] = dataset['Cabin'].map(lambda x: re.compile("([a-zAZ]+)").search(x).group())
      8     dataset['Deck'] = dataset['Deck'].map(deck)

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'Cabin'

I am not entirely sure what the error means and how I can fix it so if anyone could help me I would deeply appreciate it!!


Solution

  • The large majority of the time, a Python KeyError is raised because a key is not found in a dictionary or a dictionary subclass

    -- check train_df test_df data-frame has column named 'Cabin' or not.

    Here is an example,

    import re
    import pandas as pd
    
    test_df = pd.read_csv("test.csv")
    train_df = pd.read_csv("train.csv")
    
    deck = {"A": 1, "B": 2, "C": 3, "D": 4, "E": 5, "F": 6, "G": 7, "U": 8}
    data = [train_df, test_df]
    
    for dataset in data:
        dataset['Cabin'] = dataset['Cabin'].fillna("U0")
        dataset['Deck'] = dataset['Cabin'].map(
            lambda x: re.compile("([a-zA-Z]+)").search(x).group())
        dataset['Deck'] = dataset['Deck'].map(deck)
        dataset['Deck'] = dataset['Deck'].fillna(0)
        dataset['Deck'] = dataset['Deck'].astype(int)
    
    train_df = train_df.drop(['Cabin'], axis=1)
    test_df = test_df.drop(['Cabin'], axis=1)
    print(train_df, test_df)
    

    training/test files downloaded from here.