Search code examples
pythonregressionlinear-regression

Sample to choose when using Least square method v/s sklearn Regression method?


While using sklearn Linear Regression library, as we split the data using traintestsplit, do we have to use the training data for the OLS (least square method) or we can use the full data for OLS method and deduce the regression result.


Solution

  • There are many mistakes that data-scientists make as a beginner and one of them is to use test data as something in the learning process, look at this diagram from here: enter image description here

    As you can see the data is separated during training process and this is really important to be kept this way.

    Now the question you ask is about least square method, while you may think that by using full data you are improving the process, you are forgetting about the evaluation part which then would be better not because the regression is better. It is just better because you have shown the model the data you are testing it with.