Why isn't the prior state vector in the forward-backward algorithm the eigenvector of the transition matrix that has an eigenvalue of 1?

Wikipedia says you have no knowledge of what the first state is, so you have to assign each state equal probability in the prior state vector. But you do know what the transition probability matrix is, and the eigenvector that has an eigenvalue of 1 of that matrix is the frequency of each state in the HMM (i think), so why don't you go with that vector for the prior state vector instead?

Solution

This is really a modelling decision. Your suggestion is certainly possible, because it pretty much corresponds to prefixing the observations with a large stretch of observations where the hidden states are not observed at all or have no effect - this will give whatever the original states are time to settle down to the equilibrium distribution.

But if you have a stretch of observations with a delimited start, such as a segment of speech that starts when the speaker starts, or a segment of text that starts at the beginning of a sentence, there is no particular reason to believe that the distribution of the very first state is the same as the equilibrium distribution: I doubt very much if 'e' is the most common character at the start of a sentence, whereas it is well known to be the most common character in English text.

It may not matter very much what you choose, unless you have a lot of very short sequences of observations that you are processing together. Most of the time I would only worry if you wanted to set one of the state probabilities to zero, because the EM algorithm or Baum-Welch algorithm often used to optimise HMM parameters can be reluctant to re-estimate parameters away from zero.