ValueError in DecisionTreeClassifier

Here's the link of the decision tree implementation I used. https://www.geeksforgeeks.org/decision-tree-implementation-python/

And my dataframe is only composed of "A" and "B" with 512 values for each of them.

data

    1   2   ...      509     510    511    512
A   0.005190    0.00173 ... 0.001730    0.000577    0.002884    0.000577
A   0.000597    0.006567 ... 0.000597   0.000597    0.001194    0.001194
B   0.000582    0.010477 ... 0.001746   0.001164    0.001243    0.003108
A   0.009323    0.001865 ... 0.001865   0.001243    0.003108    0.000622
A   0.000531    0.003186 ... 0.003186   0.001593    0.002124    0.001062

...

X = data.values[:, 1:5]
Y = data.values[:, 0]

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 100)

clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)

However, it ran into a valueerror in the last line of code when I call the fit function. It does't work either even if I changed the value of parameters.

ValueError                                Traceback (most recent call last)
<ipython-input-19-484db0a3d479> in <module>
      1 # Train with gini
      2 clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,max_depth=3, min_samples_leaf=5)
----> 3 clf_gini.fit(X_train, y_train)

~\anaconda3\envs\myenv\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    901         """
    902 
--> 903         super().fit(
    904             X, y,
    905             sample_weight=sample_weight,

~\anaconda3\envs\myenv\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    189 
    190         if is_classification:
--> 191             check_classification_targets(y)
    192             y = np.copy(y)
    193 

~\anaconda3\envs\myenv\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
    181     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    182                       'multilabel-indicator', 'multilabel-sequences']:
--> 183         raise ValueError("Unknown label type: %r" % y_type)
    184 
    185 

ValueError: Unknown label type: 'continuous'

I'm honestly so confused. Can someone help me out on this? Appreciate it.

Solution

You have a problem with your y labels. If your model should predict if a sample belong to class A or B you should, according to your dataset, use the index as label y as follow since it contains the class ['A', 'B']:

X = data.values
y = data.index.values

data.values will return all the columns values while data.index.values will return you the index as a numpy array.