I am working on a model to predict the price of a house. For generating the model I am using sklearn's DecisionTreeRegressor
. I split the data into train and split with train_test_split
. But when I try to fit the data to the model I am getting the following error
KeyError Traceback (most recent call last)
<ipython-input-25-f4acd876feae> in <module>
1 for max_leaf_nodes in [5, 50, 500, 5000]:
----> 2 my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
3 print("Max leaf nodes: %d \t\t Mean Absolute Error: %d" %(max_leaf_nodes, my_mae))
<ipython-input-21-1a489238552f> in get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup)
2
3 model = DecisionTreeRegressor(max_leaf_nodes, random_state=0)
----> 4 model.fit(train_inp, train_oup)
5 predictions = model.predict(val_inp)
6 mae = mean_absolute_error(val_oup, predictions)
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
1140 sample_weight=sample_weight,
1141 check_input=check_input,
-> 1142 X_idx_sorted=X_idx_sorted)
1143 return self
1144
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
331 self.n_classes_)
332 else:
--> 333 criterion = CRITERIA_REG[self.criterion](self.n_outputs_,
334 n_samples)
335
KeyError: 5
This is my code
get_mae function
def get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup):
model = DecisionTreeRegressor(max_leaf_nodes, random_state=0)
model.fit(train_inp, train_oup)
predictions = model.predict(val_inp)
mae = mean_absolute_error(val_oup, predictions)
return mae
reading the dataset
df = pd.read_csv('../DATASETS/melb_data.csv')
y = df.Price
features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = df[features]
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)
looping to find the best number of leaf_nodes
for max_leaf_nodes in [5, 50, 500, 5000]:
my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
print("Max leaf nodes: %d \t\t Mean Absolute Error: %d" %(max_leaf_nodes, my_mae))
As you are not passing a keyword argument to DecisionTreeClassifier, the integer 5 is being passed as an argument to 'criterio'n parameter.
Unless you pass keyword parameter the 1st argument will be passed to criterion, 2nd argument to splitter parameter, so on. However criterion only accepts either of “mse”, “friedman_mse” or “mae” as argument, hence the keyError.
Please try this code:
def get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup):
model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
model.fit(train_inp, train_oup)
predictions = model.predict(val_inp)
mae = mean_absolute_error(val_oup, predictions)
return mae