python python-3.x numpy machine-learning neural-network

Implementation of Perceptron algorithm, but not efficent when I run it

When the sample size is set to 10, the average number of iterations until convergence should be around 15. However, when implementing the algorithm in my code, it takes approximately 225(or more!) iterations to reach convergence. This leads me to suspect that there may be an issue with the while loop in my code, but I'm unable to identify it.

def gen_data(N=10):
    size = (N, 2)
    data = np.random.uniform(-1, 1, size)
    point1, point2 = data[np.random.choice(data.shape[0], 2, replace=False), :]
    m = (point2[1] - point1[1]) / (point2[0] - point1[0])
    c = point1[1] - m * point1[0]
    labels = np.array([+1 if y >= m * x + c else -1 for x, y in data])
    data = np.column_stack((data, labels))
    return data, point1, point2


class PLA:
    def __init__(self, data):
        m, n = data.shape
        self.X = np.hstack((np.ones((m, 1)), data[:, :2]))
        self.w = np.zeros(n)
        self.y = data[:, -1]
        self.count = 0

    def fit(self):
        while True:
            self.count += 1
            y_pred = self.predict(self.X)
            misclassified = np.where(y_pred != self.y)[0]
            if len(misclassified) == 0:
                break

            idx = np.random.choice(misclassified)
            self.update_weight(idx)

    def update_weight(self, idx):
        self.w +=  self.y[idx] * self.X[idx]

    def sign(self, z):
        return np.where(z > 0, 1, np.where(z < 0, -1, 0))

    def predict(self, x):
        z = np.dot(x, self.w)
        return self.sign(z)

Solution

the problem is not in your right loop but in your data generation function.

You choose two points from the N random points to define your decision line:

point1, point2 = data[np.random.choice(data.shape[0], 2, replace=False), :]

Yet they stay afterwards in your dataset so they are labeled 1 and are exactly on your decision line.

If you take two random points that are not in your dataset, then the algorithm should converge in 10 steps approximately from what I tested (just sample N + 2 points and choose the first two to define your decision line, the others for your dataset).

So why is this small difference slowing that much the number of steps needed to converge ?

I would say that because two points in the dataset are on the decision line, it may be hardest to learn a zero error model, especially if others points are close to it as one model update may result in a still not perfect model.

Easy case

Hard case

Is it relevent to define your decision line such that no point in your dataset is on it ?

I would say yes as the domain space is continuous.