Search code examples
machine-learningbayesian

Bayesian method: which part is hard to evaluate in Bayesian inference


I have a question about posterior inference in Bayes.

In Bayesian inference, suppose we are given a model p(x|\theta) and a prior distribution p(\theta), we observed the dataset D ={x_1,x_2,...,x_N}, the goal is to estimate the usually intractable posterior p(\theta|D).

Sometimes I find some ones choose to evaluate the joint p(\theta,D) because this joint distribution is proportional to posterior p(\theta|D) = p(\theta,D)/p(D), what is the reason behind this? Isn't p(D) is hard to evaluate? Thank you for any advice.


Solution

  • You want to maximise p(θ|D) by finding the optimal parameters \theta.

    This can be rewritten as argmax P( θ | D) P(D)

    However, P(D) is independent of θ. Hence you can ignore it or in readable mathematical notation

    enter image description here