Assume two perceptrons run on infinite samples from the same linearly separable distribution. Will they converge to a same decision function? Will they converge to a same weight vector w? I'm a beginner in ML, so it would be great if anyone could provide a detailed explanation.
If the learning rate is sufficiently small, they will converge to the same decision boundary. But depending on the initial weights of the two Perceptrons (assuming they are separately randomized), it is possible that the ultimate weights of the two Perceptrons will be different. Note that the weights associated with the inputs are the coefficients of a separating plane and these coefficients are not unique (e.g., if you double the coefficients associated with a plane, the location of the plane is unchanged). Therefore, it is entirely possible (and likely), that the limiting weights of the two Perceptrons will not be equal.