Here's the link of the decision tree implementation I used. https://www.geeksforgeeks.org/decision-tree-implementation-python/
And my dataframe is only composed of "A" and "B" with 512 values for each of them.
data
1 2 ... 509 510 511 512
A 0.005190 0.00173 ... 0.001730 0.000577 0.002884 0.000577
A 0.000597 0.006567 ... 0.000597 0.000597 0.001194 0.001194
B 0.000582 0.010477 ... 0.001746 0.001164 0.001243 0.003108
A 0.009323 0.001865 ... 0.001865 0.001243 0.003108 0.000622
A 0.000531 0.003186 ... 0.003186 0.001593 0.002124 0.001062
...
X = data.values[:, 1:5]
Y = data.values[:, 0]
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 100)
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)
However, it ran into a valueerror in the last line of code when I call the fit
function. It does't work either even if I changed the value of parameters.
ValueError Traceback (most recent call last)
<ipython-input-19-484db0a3d479> in <module>
1 # Train with gini
2 clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,max_depth=3, min_samples_leaf=5)
----> 3 clf_gini.fit(X_train, y_train)
~\anaconda3\envs\myenv\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
901 """
902
--> 903 super().fit(
904 X, y,
905 sample_weight=sample_weight,
~\anaconda3\envs\myenv\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
189
190 if is_classification:
--> 191 check_classification_targets(y)
192 y = np.copy(y)
193
~\anaconda3\envs\myenv\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
181 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
182 'multilabel-indicator', 'multilabel-sequences']:
--> 183 raise ValueError("Unknown label type: %r" % y_type)
184
185
ValueError: Unknown label type: 'continuous'
I'm honestly so confused. Can someone help me out on this? Appreciate it.
You have a problem with your y
labels. If your model should predict if a sample belong to class A
or B
you should, according to your dataset, use the index as label y as follow since it contains the class ['A', 'B']
:
X = data.values
y = data.index.values
data.values
will return all the columns values while data.index.values
will return you the index as a numpy array.