I do have the below dataset.
I've created Logistic Regression out of it and checked Accuracy and is working fine. So now requirement is I've a new data with Age 30 and EstimatedSalary 50000 and I would like to predict whether Purchased will be 0 or 1. How to pass the new values 30 and 50000 in my python code.
Below is the python code which I've used.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
%matplotlib inline
dataset = pd.read_csv(r"suv_data.csv")
X=dataset.iloc[:,[0,1]].values
y=dataset.iloc[:,2].values
X_train,X_test,y_train,y_test=train_test_split(X, y, test_size=0.2, random_state=1)
sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.transform(X_test)
classifier=LogisticRegression(random_state=0)
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
accuracy_score(y_test,y_pred)*100
Regards,
Bharath Vikas
In general, to evaluate (i.e. call .predict
in sklearn
) a trained model, you need to input samples that have the same shape as the samples the model was trained on.
In your case I suppose (see my comment on your question) you wanted to have samples with Age
and EstimatedSalary
in the training set using Purchased
as label.
Then, to test on a single sample just try this:
single_test_sample = pd.DataFrame({'Age':[30], 'EstimatedSalary':[50000]}).iloc[:,[0,1]].values
single_test_sample = sc.transform(single_test_sample)
single_test_prediction = classifier.predict(single_test_sample)
Note that you can also add more values in the test dataframe Age
and EstimatedSalary
columns, now I only added the sample you were interested in. If you add more, the model will output a prediction for each row in the test dataframe.
Also note that your code and mine, will also work without this .values
at the end of the train/test set as sklearn already provides functionality with pandas dataframes.