Search code examples
pythonmachine-learningscikit-learnlogistic-regression

Is there a method to decrease the accuracy of a prediction model to generate more predictions?


So I am using scikit to do some relatively basic machine learning in python. I am trying to train a model to take in some feature values and return a 0 or a 1. In my specific case, an output of 0 means that the model doesn't think a Facebook post will be shared more than 10 times whereas a 1 means the model predicts the given facebook post will be shared more than 10 times.

I have trained a few different models using different techniques like logistic regression, neural networks and stochastic gradient descent. Once I have trained these models I run test them and for each model type, ie logistic regression, neural networks, etc, I see how many 1 predictions each model made and how many it got right.

Now the problem I am faced with emerges. Say the logistic regression model, when tested on 3000 items worth of test data, predicted 30 of the posts would get more than 10 shares, so it returns 1. It was correct 97% of the time when it made predictions of 1. This is all well and good but I would be more than willing to trade some accuracy to generate more predictions. For example if I could generate 200 predictions with 80% accuracy, I would make this tradeoff in a heart beat.

What are the methods that I could use to go about doing this and how would it be done? Is it even possible?


Solution

  • This is basically the precision-recall tradeoff problem.

    For Logistic regression, you could change the decision threshold to have higher recall, lower precision.

    You can read more about it here: http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html