I have some categorical values
E.g. things = 'cat', dog', 'pen', 'bar'
Which I encode to numerical values via OneHotEncoding:
car dog pen bar
1 1 1 1
I want to use some of the columns in my dataset.
E.g. car dog pen and not bar.
I do it by defining the specific columns:
dataset = dataset[['car', 'dog', 'pen']]
But sometimes some of the columns I want - are absent in my dataset, e.g. 'car'
Then Python prints the error:
KeyError: "['car'] not in index"
How can I solve the problem:
You can do some sanity checks. An example is the following function:
def custom_dataset(dataset, req_cols):
in_, out_ = [], []
if isinstance(dataset, pd.DataFrame): # optional
for col in req_cols: # check for every existing column
if col in dataset.columns:
in_.append(col) # append those that are in (i.e. valid)
else:
out_.append(col) # append those that are NOT in (i.e. invalid)
return dataset[in_] if in_ else None, out_ if out_ else None
As you can see, it returns a tuple of two elements:
Even if the dataset is not an instance of DataFrame
or the user did not provide any columns to collect, the function won't throw an error but rather it will return (None, None)
.