I'm using TensorFlow decision forest to predict the suitable crop based on few parameters. How do i get the predict() method to return the label ?
Im using this dataset for training
My code
import tensorflow_decision_forests as tfdf
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.read_csv("Crop_recommendation.csv")
#TensorFlow dataset
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(df,label="label")
# Train the model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)
print(model.summary())
pd_serving_dataset = pd.DataFrame({
"N": [83],
"P": [45],
"K" : [30],
"temperature" : [25],
"humidity" : [80.3],
"ph" : [6],
"rainfall" : [200.91],
})
tf_serving_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(pd_serving_dataset)
prediction = model.predict(tf_serving_dataset)
print(prediction)
My Output
1/1 [==============================] - 0s 38ms/step
[[0. 0. 0. 0. 0.02333334 0.07666666
0.04666667 0. 0.08333332 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0.7699994 0. ]]
Expected Output rice
For a classification problem, Tensorflow Decision Forests returns the probabilities for each class as a numpy array. If you need the class names, you have to find the class with the highest probability and map it back to its name.
Since Keras expects class names to be integers, TF-DF converts them silently during the tfdf.keras.pd_dataframe_to_tf_dataset
by sorting and mapping to integers. To get the names, you therefore have to revert the mapping. Overall, you would get
classification_names = df["label"].unique().tolist().sort()
prediction = model.predict(tf_serving_dataset)
class_predictions = list(map(lambda x: classification_names[x] , list(np.argmax(prediction, axis=1))))
# Since you only predicted a single example, class_predictions = ['rice']
Warning: TF-DF might change the way tfdf.keras.pd_dataframe_to_tf_dataset
maps classes to integers at some point in the future. It would be more prudent to perform the mapping yourself by preprocessing the pandas dataframe with
classes = df[label].unique().tolist()
print(f"Label classes: {classes}")
dataset_df[label] = df[label].map(classes.index)
For more information on how to make (fast) predictions with TF-DF, you can also check out the TF-DF predictions colab.