Search code examples
machine-learningsvm

In the context of hard-margin SVMs, what happens when a point violates the margin?


I'm currently self-studying machine learning theory and am reading through information about hard-margin and soft-margin support vector machines. I know that soft-margin SVMs can be useful when we have non-linearly separable data and we want to allow for some noise such that we might have points that "violate" the margin. However, I'm wondering about two related cases to this idea but in the context of hard-margin SVMs: (a) if you add a new point to the dataset that "violates" the hard-margin SVM, would this necessarily cause the margin to shrink? (b) similarly, if instead of adding a new point you perturbed a point already in the dataset such that it now "violated" the margin, would this cause the margin to shrink?

Intuitively, I don't think the margin would necessarily shrink but rather that the hard-margin SVM would no longer be feasible and you would instead want to use a soft-margin SVM. For example, if you're thinking about a binary classification problem, if the violating point was so egregious that it crossed the decision boundary and into the cluster of points with a different classification, I don't see how the margin could shrink — it seems like you would have to just use a soft-margin SVM.


Solution

  • For a hard margin SVM, the decision boundary is solely defined by the points that allow for the widest margin whilst maintaining strict separation. If any one of those points draws inwards towards the decision plane, the boundary will shrink accordingly.

    So if the location of the new (or perturbed) point is such that all the data are still linearly separable, then the hard margin would shrink to maintain separation.

    There'll be a point at which you can no longer draw a straight line/hyperplane whilst keeping classes separate, whereupon there is no hard margin solution to be found.

    In general SVMs in sklearn are soft-margin classifiers - they have a regularisation parameter C= that enforces stricter separation the larger it is (too high and the model begins to overfit).