Search code examples
pythonmachine-learningrandom-forestpredictionforecasting

Sales Order Delivery time Prediction Using Random Forest


This is a very noob question. But I have implemented Random forest algorithm to predict number of days taken for delivery depending on origin, destination, vendor, etc. I already implemented RF using the past 12 month's data(80% Train,20% Test data) and got good results

My question is that for implementing RF I already had no. of days taken for delivery but for the future In my dataset, I will not have that column. How am I suppose to use this already trained model for future predictions using origin, destination, dates, etc?


Solution

  • This is my randomforest, as you can see i split the dataset in 2 pieces: y and x. y is the predicted value or column and x is the whole dataset minus y. This way you can use your training set to predict in your case the delivery time.

    NOTE: this code is for a forest REGRESSOR, if you need the classifier code, let me know!

    Just the dataframe definitions:

    y = df[targetkolom] #predicted column or target column
    x = df.drop(targetkolom, 1) #Whole dataset minus target column
    

    Whole code:

    import pandas as pd
    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_squared_error, r2_score
    import matplotlib.pyplot as plt
    from sklearn.ensemble import RandomForestRegressor
    
    df = pd.read_csv('Dataset Carprices.csv')
    df.head()
    df = df.drop(['car_ID', 'highwaympg', 'citympg'], 1)
    
    targetkolom = 'price'
    
    
    #Preperation on CarName
    i =0
    while i < len(df.CarName):
        df.CarName[i] = df.CarName[i].split()[0]
        i += 1
        
    pd.set_option('display.max_columns', 200)
    #(df.describe())
    
    #Dataset standardization
    df = pd.get_dummies(df, columns=['CarName','fueltype','aspiration','doornumber','carbody',
                                     'drivewheel','enginelocation','enginetype','cylindernumber',
                                     'fuelsystem'], prefix="", prefix_sep="")
    
    #print(df.info())
         
    y = df[targetkolom]
    x = df.drop(targetkolom, 1)
    
    #Normalisation
    x = (x-x.min())/(x.max()-x.min())
    
    x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.3 ,random_state=7)
    
    model = RandomForestRegressor(n_estimators=10000, random_state=1)
    
    
    model.fit(x_train, y_train)
    
    y_pred = model.predict(x_test)
    
    print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_test, y_pred)))
    print('R2 score:', r2_score(y_test,y_pred))