Search code examples
pythonpandasnumpylogistic-regression

How can I plot a scatter graph and plot a prediction line for two features in Python?


I am trying to predict y based on two features held inside X. After reading my excel file and splitting my data into columns, my X value looks like this:

     SibSp  Parch
0        1      0
1        1      0
2        0      0
3        1      0
4        0      0
5        0      0
6        0      0
7        3      1
8        0      2
9        1      0

y denotes survival rate, 1 being survived, 0 being died. X has many many more rows. I am using train_test_split(X, y, test_size=0.4, random_state=101) to get training and testing data splits and have a method to train and test. My training code looks like this:

def train():
    # Get Data Split
    X_train, X_test, y_train, y_test = initData()

    # Create LinearRegression Instance
    lm = LinearRegression()

    # Fit Training Values
    lm.fit(X_train,y_train)

    visualise(X_test, y_test, lm.predict(X_test))

    # Return Trained Data With Testing Data
    return X_test, y_test, lm

My testing code looks like this:

def test():
    # Get The Trained Classifier
    X, y, lm = train()

    # Fit New Values
    lm.fit(X, y)

    visualise(X, y, lm.predict(X))

Which, appears to work fine - or so I think. I am now trying to visualise the data as a scatter plot with the prediction line plot.

def visualise(X, y, predictions):
    features = X.shape[1]
    colors   = ['red', 'blue']
    i        = 0
    while i <= features -1:
        plt.scatter(X.iloc[:, i], y, color=colors[i])
        # Update: Forgot to add this line when posting question
        plt.plot(X.iloc[:, i], predictions, color=colors[i])
        i=+1

But this is giving me crazy outputs with lines going everywhere. I tried to look online and found sklearn's example. This is me trying to replicate this:

I thought that maybe, because I have two features, I may need to identify them separately.

def visualise(X, y, predictions):
    newY = np.zeros(X.shape[0], X.shape[1]);
    newY[:, 0:1] = newY.iloc[:, 0]
    plt.scatter(X, y, color='blue')
    plt.plot(X, predictions, color='red')

    plt.xticks(())
    plt.yticks(())

    plt.show()

I had to create a newY array since X has two features, y had 1 so the shapes where different. But now I am getting an error at the line newY = np.zeros(X.shape[0], X.shape[1]);

TypeError: data type not understood

Update

def visualise(X, y, predictions):
    newY = np.zeros((X.shape[0], X.shape[1]));
    newY[:, 0] = y
    newY[:, 1] = y
    plt.scatter(X, newY, color='blue')
    plt.plot(X, predictions, color='red')

Now fixes the error, but this is my output:

enter image description here

How can I plot my scatter graph and plot a line for my predictions?


Solution

  • As you have two features you can't draw a prediction line. If anything you probably want a prediction contour plot.

    Your example is much more similar to this two-featured example here https://scikit-learn.org/stable/auto_examples/svm/plot_iris.html