Search code examples
statisticsbigdatadata-analysisdata-scienceprobability-density

When and why would you want to use a Probability Density Function?


A wanna-be data-scientist here and am trying to understand as a data scientist, when and why would you use a Probability Density Function (PDF)?

Sharing a scenario and a few pointers to learn about this and other such functions like CDF and PMF would be really helpful. Know of any book that talks about these functions from practice stand-point?


Solution

  • Why?

    Probability theory is very important for modern data-science and machine-learning applications, because (in a lot of cases) it allows one to "open up a black box" and shed some light into the model's inner workings, and with luck find necessary ingredients to transform a poor model into a great model. Without it, a data scientist's work is very much restricted in what they are able to do.

    A PDF is a fundamental building block of the probability theory, absolutely necessary to do any sort of probability reasoning, along with expectation, variance, prior and posterior, and so on.

    Some examples here on StackOverflow, from my own experience, where a practical issue boils down to understanding data distribution:

    When?

    The questions above provide some examples, here're a few more if you're interested, and the list is by no means complete:

    I personally try to find probabilistic interpretation whenever possible (choice of loss function, parameters, regularization, architecture, etc), because this way I can move from blind guessing to making reasonable decisions.

    Reading

    This is very opinion-based, but at least few books are really worth mentioning: The Elements of Statistical Learning, An Introduction to Statistical Learning: with Applications in R or Pattern Recognition and Machine Learning (if your primary interest is machine learning). That's just a start, there are dozens of books on more specific topics, like computer vision, natural language processing and reinforcement learning.