I am using a csv file from a Udemy course for the sake of training. I only want to use age and country columns to keep things simple. Here is the code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.compose import ColumnTransformer as ct
from sklearn.model_selection import train_test_split as tts
data = pd.read_csv("advertising.csv")
X = data[["Age","Country"]]
y = data[["Clicked on Ad"]]
from sklearn.preprocessing import OneHotEncoder
cat = X["Country"]
one_hot = OneHotEncoder()
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
transformed_X = transformer.fit_transform(X)
print(transformed_X)
I get this error:
runfile('C:/Users/--/.spyder-py3/untitled0.py', wdir='C:/Users/--/.spyder-py3')
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Tunisia'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 447, in _get_column_indices
col_idx = all_columns.get_loc(col)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'Tunisia'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\--\.spyder-py3\untitled0.py", line 17, in <module>
transformed_X = transformer.fit_transform(X)
File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 529, in fit_transform
self._validate_remainder(X)
File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 327, in _validate_remainder
cols.extend(_get_column_indices(X, columns))
File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 454, in _get_column_indices
raise ValueError(
ValueError: A given column is not a column of the dataframe
"Tunisia" is the first country under the column of "Country"
What might have caused the problem?
Thank you in advance.
The problem occurs because you are not specifying the column to transform correctly. In this line:
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
cat
should stand for the index or the name of the column you want to transform. However, you are passing a whole dataframe because you set cat = X["Country"]
.
To fix this issue, just use one of the follwing:
#option 1
cat = ['Country']
# option 2
cat = [1]
and it should work fine.