Given a standard LDA model with few 1000 topics and few millions of documents, trained with Mallet / collapsed Gibbs sampler:
When inferring a new document: Why not just skip sampling and simply use the term-topic counts of the model to determine the topic assignments of the new document? I understand that applying the Gibbs sampling on the new document is taking into account the topic mixture of the new document which in turn influence how topics are composed (beta, term-freq. distributions). However as topics are kept fixed when inferring a new document, i don't see why this should be relevant.
An issue with sampling is the probabilistic nature - sometimes documents topic assignments inferred, greatly vary on repeated invocations. Therefore i would like to understand the theoretical and practical value of the sampling vs. just using a deterministic method.
Just using term topic counts of the last Gibbs sample is not a good idea. Such an approach doesn't take into account the topic structure: if a document has many words from one topic, it's likely to have even more words from that topic [1].
For example, say two words have equal probabilities in two topics. The topic assignment of the first word in a given document affects the topic probability of the other word: the other word is more likely to be in the same topic as the first one. The relation works the other way also. The complexity of this situation is why we use methods like Gibbs sampling to estimate values for this sort of problem.
As for your comment on topic assignments varying, that can't be helped, and could be taken as a good thing: if a words topic assignment varies, you can't rely on it. What you're seeing is that the posterior distribution over topics for that word has no clear winner, so you should take a particular assignment with a grain of salt :)
[1] assuming beta, the prior on document-topic distributions, encourages sparsity, as is usually chosen for topic models.