I have a question about posterior inference in Bayes.
In Bayesian inference, suppose we are given a model p(x|\theta) and a prior distribution p(\theta), we observed the dataset D ={x_1,x_2,...,x_N}, the goal is to estimate the usually intractable posterior p(\theta|D).
Sometimes I find some ones choose to evaluate the joint p(\theta,D) because this joint distribution is proportional to posterior p(\theta|D) = p(\theta,D)/p(D), what is the reason behind this? Isn't p(D) is hard to evaluate? Thank you for any advice.
You want to maximise p(θ|D) by finding the optimal parameters \theta.
This can be rewritten as argmax P( θ | D) P(D)
However, P(D) is independent of θ. Hence you can ignore it or in readable mathematical notation