Search code examples
regressiongeospatialrandom-forestprediction

How to improve Random forest regression prediction result


I am working with parking occupancy prediction using machine learning random forest regression. I have 6 features, I have tried to implement the random forest model but the results are not good, As I am very new to this I do not know what kind of model is suitable for this kind of problem. My dataset is huge I have 47 million rows. I have also used Random search cv but I cannot improve the model. Kindly have a look at the code below and help to improve or suggest another model.

Random forest regression

The features used are extracted with the help of the location data of the parking lots with a buffer. Kindly help me to improve.


Solution

  • So, your used variables are : ['restaurants_pts','population','res_percent','com_percent','supermarkt_pts', 'bank_pts']

    The thing I see is, for a same Parking, those variables won't change, so the Regression will just predict the "average" occupancy of the parking. One of the key part of your problem seem to be that the occupancy is not the same at 5pm and at 4am...

    I'd suggest you work on a time variable (ex : arrival) so it's usable. Itself, the variable cannot be understood by the model, but you can work on it to create categories with it. For example, you make a preprocess selecting only the HOUR of your variable, and then make categories with it (either each hour being a category, or larger categories like ['noon - 6am', '6am - 10am', '10am - 2pm', '2pm - 6 pm', '6 pm - noon'])