This is a very noob question. But I have implemented Random forest algorithm to predict number of days taken for delivery depending on origin, destination, vendor, etc. I already implemented RF using the past 12 month's data(80% Train,20% Test data) and got good results
My question is that for implementing RF I already had no. of days taken for delivery but for the future In my dataset, I will not have that column. How am I suppose to use this already trained model for future predictions using origin, destination, dates, etc?
This is my randomforest, as you can see i split the dataset in 2 pieces: y and x. y is the predicted value or column and x is the whole dataset minus y. This way you can use your training set to predict in your case the delivery time.
NOTE: this code is for a forest REGRESSOR, if you need the classifier code, let me know!
Just the dataframe definitions:
y = df[targetkolom] #predicted column or target column
x = df.drop(targetkolom, 1) #Whole dataset minus target column
Whole code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
df = pd.read_csv('Dataset Carprices.csv')
df.head()
df = df.drop(['car_ID', 'highwaympg', 'citympg'], 1)
targetkolom = 'price'
#Preperation on CarName
i =0
while i < len(df.CarName):
df.CarName[i] = df.CarName[i].split()[0]
i += 1
pd.set_option('display.max_columns', 200)
#(df.describe())
#Dataset standardization
df = pd.get_dummies(df, columns=['CarName','fueltype','aspiration','doornumber','carbody',
'drivewheel','enginelocation','enginetype','cylindernumber',
'fuelsystem'], prefix="", prefix_sep="")
#print(df.info())
y = df[targetkolom]
x = df.drop(targetkolom, 1)
#Normalisation
x = (x-x.min())/(x.max()-x.min())
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.3 ,random_state=7)
model = RandomForestRegressor(n_estimators=10000, random_state=1)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_test, y_pred)))
print('R2 score:', r2_score(y_test,y_pred))