So I am currently creating a machine learning model in Python which predicts the outcome of a football match. Below is the code from the training of the model:
features = ['Home Team',..., 'home_team_avg_Sh_last_3', 'Away Team',..., 'away_team_avg_Sh_last_3']
label = ['Match Result']
df_allteammerged[features + label]
Home Team | ... | Away Team | ... | Match Result |
---|---|---|---|---|
Arsenal | ... | Fulham | ... | Home Win |
... | ... | ... | ... | |
Brentford | ... | Everton | ... | Draw |
encode = ['Home Team', 'Away Team', 'Match Result']
from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()
for e in encode:
df_allteammerged[e] = enc.fit_transform(df_allteammerged[e])
df_allteammerged[features + label]
Home Team | ... | Away Team | ... | Match Result |
---|---|---|---|---|
0 | ... | 8 | ... | 1 |
... | ... | ... | ... | |
3 | ... | 7 | ... | 2 |
X = df_allteammerged[features].values
y = df_allteammerged[label].values.flatten()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
from xgboost import XGBClassifier
model = XGBClassifier(objective="multi:softmax")
model.fit(X_train,y_train)
xgb_pred = model.predict(X_test)
Once the model is trained, I created a DataFrame which has the Actual (Match Result) Result, Predicted Result, Home Team and Away team for the test data
xtesthome = [i[0] for i in X_test]
xtestaway = [i[9] for i in X_test]
df_pred_compare = pd.DataFrame({"Actual Result": y_test, "Predicted Result": xgb_pred, "Home Team": xtesthome,"Away Team": xtestaway})
df_pred_compare
Then this will be saved to a CSV file
So, the main problem is I want to reverse the encoding on Home Team and Away Team so rather than the numbers the original team names will be present in the dataframe/csv file
I tried following the solution from this post Python - How to reverse the encoding of data encoded with LabelEncoder after it has been split by train_test_split?
This included removing the .values
from X and y to make them dataframes rather than arrays, but the returned dataframe did not reverse the encoding of Home Team and Away Team
Any help would be appreciated
I would create few different encoders and store them in dict to be able to reverse ecoding easily:
encode = ['Home Team', 'Away Team', 'Match Result']
from sklearn.preprocessing import LabelEncoder
enc_dict = {}
for e in encode:
enc_dict[e] = LabelEncoder()
df_allteammerged[e] = enc_dict[e].fit_transform(df_allteammerged[e])
df_allteammerged[features + label]
then after all modelling you can easily reverse encoding using this code:
xtesthome = enc_dict['Home Team'].inverse_transform( [i[0] for i in X_test] )
xtestaway = enc_dict['Away Team'].inverse_transform( [[i[9] for i in X_test] )