Search code examples
pythonpandasmachine-learningcross-validation

Relate the predicted value to it index/identification number


I am training a model to predict true or false based on some data. I drop the product number from the list of features when training and testing the model.

X = df.drop(columns = 'Product Number', axis = 1)
y = df['result']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

SVC = LinearSVC(max_iter = 1200)
SVC.fit(X_train, y_train)
y_pred = SVC.predict(X_test)

Is there any way for me to recover the product number and its features for the item that has passed or failed? How do I get/relate the results of y_pred to which product number it corresponds to?

I also plan on using cross validation so the data gets shuffled, would there still be a way for me to recover the product number for each test item?


Solution

  • I realised I'm using cross validation only to evaluate my model's performance so I decided to just run my code without shuffling the data to see the results for each datapoint.

    Edit: For evaluation without cross validation, I drop the irrelevant columns only when I pass it to the classifier as shown below:

    cols = ['id', 'label']
    X = train_data.copy()
    y = train_data['label']
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=2)
    
    knn = make_pipeline(StandardScaler(),KNeighborsClassifier(n_neighbors=10))
    y_val_pred = knn.fit(X_train.drop(columns=cols), y_train).predict(X_val.drop(columns=cols))
    
    X_val['y_val_pred'] = y_val_pred
    

    I join the y_val_pred after prediction to check which datapoints have been misclassified.