I have two .csv
files that one of them is test.csv
and the other one is train.csv
. However, as you can predict the test file does not have the target column
('y' in this case) while train file has.
What I wanted to do is first using train file to train the system entirely, then using the test file just to see predictions.
I'm using from sklearn.model_selection import train_test_split()
to create train and test examples but it accepts 1 file path only. I want to train the system using train file first, then when it finished I want to get test datas from test.csv
file and make the predictions.
So first I tried classic way but decreasing test size so It'll be like "this file used for train only",
import pandas as pd
from sklearn.svm import SVC
dataset = pd.read_csv(r'path\train.csv', sep=",")
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.001, random_state = 45)
clf = SVC(kernel = 'rbf')
clf.fit(X_train, y_train)
but then, when it comes to real test part(which I want to use the data in test.csv that doesn't have target values), how can I import test.csv somehow I can use the test data in trained model above
#get data from test.csv as somehow X_test
clfPredict = clf.predict(X_test)
If this is not possible using train_test_split()
, what's the proper way to accomplish this task?
You need to load the train CSV and split it to:
y_train = df1['Y column']
X_train = df1.drop('Y Column', axis = 1)
And regarding test:
X_test = df2
and y_test will be the result from clf.predict(X_test)