I am trying to evaluate my model with train_test_split. I have defined the following functions to create the output array on the table (top column) according to the input in function:
def top_sh(num):
###Get the top(num) in Shanghai data and arrange
####input and output variables accordingly
#Add column to be output value, either zero or one
#shanghai = shanghai_cp.copy()
if 'top' in shanghai.columns:
shanghai.drop(columns = shanghai.columns[-1],inplace = True)
shanghai['top'] = shanghai['world_rank'].apply(lambda x: 1 if x<= num else 0)
out = print('*****************'+ '\n' + 'Output array: Top'+ str(num)+ '\n' + 'Disregarding in Analysis: World rank')
#call = print(shanghai.head(15))
return out
Then I defined the process for the train test split as following:
def train_test(df,size, seed):
###Split the data into test and train sets and test
#Get input output of df
if df == 'shanghai':
column1 = shanghai.columns[1:7]
Y = shanghai.values[: , -1].astype(int)
y = np.ravel(Y)
X = shanghai.values[: , 1:7]
elif df == 'times':
column1 = times.columns[1:10]
Y = times.values[: , -1].astype(int)
y = np.ravel(Y)
X = times.values[: , 1:10]
else:
return print('Available Datasets: "shanghai" , "times"')
#Split into train and test
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X,Y, test_size=size, random_state=seed)
#Get the regression
model= LogisticRegression(solver='liblinear')
model.fit(X_Train,X_Test)
#See how accurately it is with the split
result=model.score(X_Test,Y_Test)
print(f'Accuaracy {result*100:5.3f}')
return
I run the following code:
top_sh(50)
shanghai.head()
X.shape
Y
Y.shape
train_test('shanghai',0.3,7)
```
X.shape = (768, 8)
Y.shape = (768, )
I get the following error on train_test function, specifically on model.fit line:
> ValueError: bad input shape (150, 6)
The issue is most likely arising from what you pass to the fit
. It is expecting X-values as predictors and Y-values as predictions, therefore what you this line is incorrect:
model.fit(X_Train,X_Test)
You should instead, try passing Y_train
:
model.fit(X_train,Y_train)