python machine-learning scikit-learn xgboost

How do I handle a model dataset that has a column of IDs?

I am trying to build a model for NFL Draft prospects probability of success. I am having trouble finding a way to print the players names with their corresponding model output. For example, currently it prints something like this "[79 22 36 72 20 48 2 68 16 36 11 68 68 16 22 17 60 62 15 17 11 68 0 84 28 22 45 48 79 84 2 37 68]", I would like the player associated with those outputs to print as well. I am working with some template code I found online for the type of model I would like to build. I will post it below.

LINK TO DATA: https://docs.google.com/spreadsheets/d/1BQa34rfq7oC3jOO65c4xUqKTuhDGKf46pPwGmjSS3ko/edit?usp=sharing

Column "Player" really doesn't matter during training as this data is historical drafts going back to 2004 but obviously for the final output when I ask the model to predict this years prospects I would needs the names output as well.

    import pandas as pd
    import xgboost
    from sklearn import model_selection
    from sklearn.metrics import accuracy_score
    from sklearn.preprocessing import LabelEncoder
    
    # load data
    data = pd.read_csv(r"C:\Users\yanke\Documents\NFLDraft\QBDataSet.csv", index_col=0)
    dataset = data
    
    # split data into X and y
    X = dataset.iloc[:,0:4]
    Y = dataset.iloc[:,4]
    # encode string class values as integers
    label_encoder = LabelEncoder()
    label_encoder = label_encoder.fit(Y)
    label_encoded_y = label_encoder.transform(Y)
    
    seed = 7
    test_size = 0.33
    X_train, X_test, y_train, y_test = model_selection.train_test_split(X, label_encoded_y, test_size=test_size, random_state=seed)
    
    # fit model no training data
    model = xgboost.XGBClassifier()
    model.fit(X_train, y_train)
    print(model)
    
    # make predictions for test data
    y_pred = model.predict(X_test)
    predictions = [round(value) for value in y_pred]
    
    # evaluate predictions
    accuracy = accuracy_score(y_test, predictions)
    print("Accuracy: %.2f%%" % (accuracy * 100.0))
    print(y_pred)

Solution

Will this work?

for player, prediction in zip(X_test.index, predictions):
  print(player, prediction)

Output:

Colin Kaepernick 3
Jeff Driskel 2
Dwayne Haskins 1
Colt McCoy 1
Ryan Lindley 2
Jameis Winston 2
Sam Darnold 1
Sam Bradford 1
Troy Smith 1
Johnny Manziel 1
Matthew Stafford 3
Kyler Murray 2
Daniel Jones 2
Gardner Minshew 1
Joe Webb 2
Curtis Painter 1
Andrew Luck 1
Josh Freeman 2
Landry Jones 1
Ryan Finley 1
Deshaun Watson 1
Marcus Mariota 1
Dan Orlovsky 1
Russell Wilson 2
Nathan Peterman 1
Kyle Orton 2
Paxton Lynch 2
Alex Smith 1
Brodie Croyle 1
Vince Young 2
Brandon Weeden 1
Teddy Bridgewater 1
Brett Hundley 1