Search code examples
pythonmachine-learninggraphoverfitting-underfitting

My training and testing graph remains constant, can anyone help me interpret it or explain where have I gone wrong?


I'm doing a simple machine learning project. At initial model, my model was over fitting, as I understood by googling and learning about what over fitting is and how to detect it. Then I used SMOTE to reduce over fitting and tried to find if it still over fits. I'm getting a graph that I'm unable to interpret and tried several links to understand what is happening but failed. Can anyone please tell me if this graph is okay or there is something wrong in it? (The picture and code is given below) enter image description here

def EF_final(x_train, y_train, x_test, y_test):
  train_scores, test_scores = [], []
  values = [i for i in range(1, 21)]
# evaluate a decision tree for each depth
  for i in values:
    # configure the model
      model_ef = ExtraTreesClassifier(n_estimators = 80, random_state=42, min_samples_split = 2, min_samples_leaf= 1, max_features = 'sqrt', max_depth= 24, bootstrap=False)
    # fit model on the training dataset
      model_ef.fit(x_train, y_train)
    # evaluate on the train dataset
      train_yhat = model_ef.predict(x_train)
      train_acc = accuracy_score(y_train, train_yhat)
      train_scores.append(train_acc)
    # evaluate on the test dataset
      test_yhat = model_ef.predict(x_test)
      test_acc = accuracy_score(y_test, test_yhat)
      test_scores.append(test_acc)
    # summarize progress
      print('>%d, train: %.3f, test: %.3f' % (i, train_acc, test_acc))
# plot of train and test scores vs tree depth
  plt.plot(values, train_scores, '-o', label='Train')
  plt.plot(values, test_scores, '-o', label='Test')
  plt.legend()
  plt.show()

Solution

  • Cant comment on results of your model prediction without viewing the data, but to answer your title question.
    You seem to configure and create the same model in each loop without using the variable i to change model depth . Even the random_state of the model is constant hence you can expect same result . Consider switching the model configuration line to

    model_ef = ExtraTreesClassifier(n_estimators = 80,min_samples_split = 2, min_samples_leaf= 1, max_features = 'sqrt', max_depth = i, bootstrap=False)
    

    This will change the graph result to help u choose a better model, Accuracy can not be commented on however without knowing what kind of data is being passed.