Search code examples
machine-learningnormal-distributionlogistic-regressiongenerative-programming

When to use generative algorithms in machine learning?


Suppose I have a training set made by (x, y) samples.

To apply a generative algorithm, let's say the Gaussian discriminative, I must assume that

p(x|y) ~ Normal(mu, sigma) for every possible sigma

or I just need to I know if x ~ Normal(mu, sigma) given y?

How can I evaluate if p(x|y) follows a multivariate Normal distribution well enough (up to a threshold) to me to use generative algorithm?


Solution

  • That's a lot of questions.

    To apply a generative algorithm, let's say the Gaussian discriminative, I must assume that

    p(x|y) ~ Normal(mu, sigma) for every possible sigma

    No, you must assume that's true for some mu, sigma pair. In practice you won't know what mu and sigma is, so you'll need to either estimate it (frequentist, Max Likelihood/Max A Posteriori estimates), or even better incorporate uncertainty about your estimates of the parameters into predictions (Bayesian methodology).

    How can I evaluate if p(x|y) follows a multivariate Normal distribution?

    Classically, using a goodness of fit test. If the dimensionality of x is more than a handful, though, this won't work because standard tests involve the number of items in bins, and the number of bins you need in high dimensions is astronomical so you have very low expected counts.

    A better idea is to say the following: what are my options for modelling the (conditional) distribution of x? You can compare between these options using model comparison techniques. Read up on model checking and comparison.

    Finally, your last point:

    well enough (up to a threshold) to me to use generative algorithm?

    The paradox of many generative methods, including Fisher's Linear Discriminant Analysis for example, as well as the Naive Bayes classifier, is that the classifier can work very well even though the model is poor for the data. There's no particularly sound reason why this should be the case, but many have observed it to be empirically true. Whether it works can be checked much more easily than whether the assumed distribution explains the data very well: just split your data into training and testing and find out!