Search code examples
machine-learningprobability

Machine Learning - What's the point of finding bounds?


I'm currently studying Machine Learning at university with the help of the Foundations of Machine Learning (Second Edition) textbook. I've learned about PAC Learning, Rademacher Complexity, Growth Functions, and VC-Dimension. Throughout the textbook and course, we seem to be spending a lot of time finding bounds.

I'm somewhat confused as to why we're doing this.

  1. What exactly is the whole point of finding these bounds?
  2. How are these bounds actually applied in the field machine learning in the real world?

Neither the textbook nor my professor have been particularly helpful in answering my questions.

Thanks.


Solution

  • Decision boundary is the basic principle to identify pattern (the most simplistic view to split something in 2 sets). From this, the Generalization Bounds is:

    In the theory of statistical machine learning, a generalization bound – or, more precisely, a generalization error bound – is a statement about the predictive performance of a learning algorithm or class of algorithms.

    So the whole point of finding the bounds is to allow you to identify the performance (or efficiency) of the algorithm.

    This may seem like a trivial question; as the answer is simply that because the learning algorithm can search the entire hypothesis space looking for its optimal solution. While this answer is correct, we need a more formal answer in light of the generalization inequality we’re studying.

    If your question is more like: why a lot of time (so you emphasize the time spent on this), or in other words: why do we need to consider every possible hypothesis? (more):

    This may seem like a trivial question; as the answer is simply that because the learning algorithm can search the entire hypothesis space looking for its optimal solution. While this answer is correct, we need a more formal answer in light of the generalization inequality we’re studying.