Search code examples
pythonscikit-learndecision-tree

KeyError while fitting data to DecisionTreeRegressor


I am working on a model to predict the price of a house. For generating the model I am using sklearn's DecisionTreeRegressor. I split the data into train and split with train_test_split. But when I try to fit the data to the model I am getting the following error

KeyError                                  Traceback (most recent call last)
<ipython-input-25-f4acd876feae> in <module>
      1 for max_leaf_nodes in [5, 50, 500, 5000]:
----> 2     my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
      3     print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(max_leaf_nodes, my_mae))

<ipython-input-21-1a489238552f> in get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup)
      2 
      3     model = DecisionTreeRegressor(max_leaf_nodes, random_state=0)
----> 4     model.fit(train_inp, train_oup)
      5     predictions = model.predict(val_inp)
      6     mae = mean_absolute_error(val_oup, predictions)

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
   1140             sample_weight=sample_weight,
   1141             check_input=check_input,
-> 1142             X_idx_sorted=X_idx_sorted)
   1143         return self
   1144 

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    331                                                          self.n_classes_)
    332             else:
--> 333                 criterion = CRITERIA_REG[self.criterion](self.n_outputs_,
    334                                                          n_samples)
    335 

KeyError: 5

This is my code

get_mae function

def get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup):

    model = DecisionTreeRegressor(max_leaf_nodes, random_state=0)
    model.fit(train_inp, train_oup)
    predictions = model.predict(val_inp)
    mae = mean_absolute_error(val_oup, predictions)

    return mae

reading the dataset

df = pd.read_csv('../DATASETS/melb_data.csv')

y = df.Price

features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = df[features]

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)

looping to find the best number of leaf_nodes

for max_leaf_nodes in [5, 50, 500, 5000]:
    my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
    print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(max_leaf_nodes, my_mae))

Solution

  • As you are not passing a keyword argument to DecisionTreeClassifier, the integer 5 is being passed as an argument to 'criterio'n parameter.

    Unless you pass keyword parameter the 1st argument will be passed to criterion, 2nd argument to splitter parameter, so on. However criterion only accepts either of “mse”, “friedman_mse” or “mae” as argument, hence the keyError.

    Please try this code:

    def get_mae(max_leaf_nodes, train_inp, val_inp, train_oup, val_oup):    
        model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
        model.fit(train_inp, train_oup)
        predictions = model.predict(val_inp)
        mae = mean_absolute_error(val_oup, predictions)
    
        return mae