Poor Accuracy of Gradient Descent Perceptron

I'm trying to make a start with neural networks from the very beginning. This means starting off toying with perceptrons. At the moment I'm trying to implement batch gradient descent. The guide I'm following provided the following pseudocode:

I've tried implementing as below with some dummy data and noticed it isn't particularly accurate. It converges, to what I presume is some local minima.

My question is:
What ways are there for me to check that this is infact a local minima, I've been looking into how to plot this, but I'm unsure how to actually go about doing this. In addition to this, is there a way to achieve a more accurate result using gradient descent? Or would I have to use a more complex approach, or possibly run it numerous times starting from different random weights to try and find the global minimum?

I had a look about the forum before posting this, but didn't find much information that made me feel confident what I'm doing here or what's happening is in fact correct, so any helps would be great.

import pandas as pd
import numpy as np
import random
import math


def main():

    learningRate = 0.1
    np.random.seed(1)

    trainingInput = np.asmatrix([
              [1, -1],
              [2, 1],
              [1.5, 0.5],
              [2, -1],
              [1, 2]
            ])

    biasAccount = np.ones((5,1))
    trainingInput = np.append(biasAccount, trainingInput, axis=1)
    trainingOutput = np.asmatrix([
                [0],
                [1],
                [0],
                [0],
                [1]
            ])



    weights = 1 * np.random.random((3,1))-1

    for iteration in range(10000):
        prediction = np.dot(trainingInput, weights)

        print("Weights: \n" + str(weights))

        print("Prediction: \n" + str(prediction))

        error = trainingOutput - prediction

        print("Error: \n" + str(error))

        intermediateResult = np.dot(error.T, trainingInput)
        delta = np.dot(learningRate, intermediateResult)

        print("Delta: \n" + str(delta))

        weights += delta.T


main()

Solution

There is no guarantee that you'll find the global minimum. Often, people perform multiple runs and take the best one. Advance approaches include decaying the learning rate, using an adaptive learning rate (e.g. with RMSProp or Adam), or using GD with momentum.

There are multiple ways to monitor convergence:

Use the loss (hint: (t -Xw)X is the derivative), check for small values or for small changes.
Early stopping: check that the error on a (held out) validation set decreases, if it doesn't, stop training.
(Possible, you could even check the distances between weights in consecutive steps to see if anything changes.)