Search code examples
pythonmatplotlibdecision-treescatter

Plot scatter for classification algorithm


Please help me to create scatter graph for this classification algorithm. Here in y i have a column of labels( 0, 1) i want the predicted labels in two different colors for both labels.

X = np.array(df.iloc[: , [0, 1,2,3,4,5,6,7,8,9,10,]].values)
y = df.iloc[: , 17].values 
dtc = DecisionTreeClassifier()
train_x, test_x, train_y, test_y = train_test_split(X, y, train_size = 0.8, shuffle = True)
kf = KFold(n_splits = 5)
dtc=dtc.fit(train_x, train_y)
dtc_labels = dtc.predict(test_x)

Solution

  • I don't have access to your dataframes, but here is a minimum working example, assuming I understood right.

    The point is that you have to use logical indexing for your numpy arrays during plotting. This is exemplified by the last two lines.

    import numpy as np
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.model_selection import train_test_split, KFold
    import matplotlib.pyplot as plt
    X = np.zeros((100,2))
    X[:,0] = np.array(list(range(100)))
    X[:,1] = np.array(list(range(100)))
    y = list([0] * 50 + [1] * 50)
    dtc = DecisionTreeClassifier()
    train_x, test_x, train_y, test_y = train_test_split(X, y, train_size = 0.8, shuffle = True)
    kf = KFold(n_splits = 5)
    dtc=dtc.fit(train_x, train_y)
    dtc_labels = dtc.predict(test_x)
    
    plt.scatter(test_x[dtc_labels == 0,0],test_x[dtc_labels == 0,1])
    plt.scatter(test_x[dtc_labels == 1,0],test_x[dtc_labels == 1,1])