python machine-learning random-forest prediction forecasting

Sales Order Delivery time Prediction Using Random Forest

This is a very noob question. But I have implemented Random forest algorithm to predict number of days taken for delivery depending on origin, destination, vendor, etc. I already implemented RF using the past 12 month's data(80% Train,20% Test data) and got good results

My question is that for implementing RF I already had no. of days taken for delivery but for the future In my dataset, I will not have that column. How am I suppose to use this already trained model for future predictions using origin, destination, dates, etc?

Solution

This is my randomforest, as you can see i split the dataset in 2 pieces: y and x. y is the predicted value or column and x is the whole dataset minus y. This way you can use your training set to predict in your case the delivery time.

NOTE: this code is for a forest REGRESSOR, if you need the classifier code, let me know!

Just the dataframe definitions:

y = df[targetkolom] #predicted column or target column
x = df.drop(targetkolom, 1) #Whole dataset minus target column

Whole code:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor

df = pd.read_csv('Dataset Carprices.csv')
df.head()
df = df.drop(['car_ID', 'highwaympg', 'citympg'], 1)

targetkolom = 'price'


#Preperation on CarName
i =0
while i < len(df.CarName):
    df.CarName[i] = df.CarName[i].split()[0]
    i += 1
    
pd.set_option('display.max_columns', 200)
#(df.describe())

#Dataset standardization
df = pd.get_dummies(df, columns=['CarName','fueltype','aspiration','doornumber','carbody',
                                 'drivewheel','enginelocation','enginetype','cylindernumber',
                                 'fuelsystem'], prefix="", prefix_sep="")

#print(df.info())
     
y = df[targetkolom]
x = df.drop(targetkolom, 1)

#Normalisation
x = (x-x.min())/(x.max()-x.min())

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.3 ,random_state=7)

model = RandomForestRegressor(n_estimators=10000, random_state=1)


model.fit(x_train, y_train)

y_pred = model.predict(x_test)

print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_test, y_pred)))
print('R2 score:', r2_score(y_test,y_pred))