Search code examples
pythonmachine-learningrandom-forestprediction

Is Random Forest regression is good for this kind of regression problem?


I am working with vehicle occupancy prediction and I am very much new to this, I have used random forest regression to predict the occupancy values.

Jupyter notebook_Random forest

I have around 48 M rows and I have used all the data to predict the occupancy, As the population and occupancy were normalized due to the higher numbers and I have predicted. I am sure the model is not good, how can I interpret the results from the RMSE and MAE. Also, the plot shows that it is not predicted well, Am I doing it in a correct way to predict the occupancy of the vehicles.

Kindly help me with the following,

  1. Is Random forest regression is a good method to approach this problem?
  2. How can I improve the model results?
  3. How to interpret the results from the outcome

Solution

    1. Is Random forest regression is a good method to approach this problem?

      -> The model is just a tool and can of course be used. However, no one can answer whether it is suitable or not, because we have not studied the distribution of data. It is suggested that you can try logistic regression, support vector machine regression, etc.

    2. How can I improve the model results?

      -> I have several suggestions on how to improve: 1.Do not standardize without confirming whether the y value column has extreme values. 2.When calculating RMSE and Mae, use the original y value. 3.Deeply understand business logic and add new features. 4.Learn about data processing and Feature Engineering on the blog.

    3. How to interpret the results from the outcome

      -> Bad results do not necessarily mean no value. You need to compare whether the model is better than the existing methods and whether it has produced more economic value. For example, error is loss, and accuracy is gain.

    Hope these can help you.