So I have managed to estimate most of the parameters in a particular Hidden Markov Model (HMM)
given the learn dataset. These parameters are: the emission probabilities
of the hidden states and the transition matrix
$P$ of the Markov chain. I used Gibbs sampling
for the learning. Now there is one set of parameters that is still missing that is the initial probabilities $\pi$ (probability distribution of where the chain starts) and I want to deduce it from the learned parameters. How can I do it?
Also, is it true that $\pi$ is the same as the stationary probability distribution of $P$?
The easiest way to achieve this is to use a special [start] token. You then know that this will always be the first token, and transitions from the [start] token to other words are learned in the model.
The stationary distribution of the Markov chain is the marginal distribution of $P$.