time-series sampling measurement probability-theory

Estimating change of a cyclic boolean variable

We have a boolean variable X which is either true or false and alternates at each time step with a probability p. I.e. if p is 0.2, X would alternate once every 5 time steps on average. We also have a time line and observations of the value of this variable at various non-uniformly sampled points in time.

How would one learn, from observations, the probability that after t+n time steps where t is the time X is observed and n is some time in the future that X has alternated/changed value at t+n given that p is unknown and we only have observations of the value of X at previous times? Note that I count changing from true to false and back to true again as changing value twice.

Solution

I'm going to approach this problem as if it were on a test.

First, let's name the variables.

B_x is value of the boolean variable after x opportunities to flip (and B₀ is the initial state). P is the chance of changing to a different value every opportunity.

Given that each flip opportunity is not related to other flip opportunities (there is, for example, no minimum number of opportunities between flips) the math is extremely simple; since events are not affected by the events of the past, we can consolidate them into a single computation, which works best when considering B_x not as a boolean value, but as itself a probability.

Here is the domain of the computations we will use: B_x is a probability (with a value between 0 and 1 inclusive) representing the likelyhood of truth. P is a probability (with a value between 0 and 1 inclusive) representing the likelyhood of flipping at any given opportunity.

The probability of falseness, 1 - B_x, and the probability of not flipping, 1 - P, are probabilistic identities which should be quite intuitive.

Assuming these simple rules, the general probability of truth of the boolean value is given by the recursive formula B_x+1 = B_x*(1-P) + (1-B_x)*P.

Code (in C++, because it's my favorite language and you didn't tag one):

int   max_opportunities = 8;  // Total number of chances to flip.
float flip_chance = 0.2;      // Probability of flipping each opportunity.
float probability_true = 1.0; // Starting probability of truth.
                              // 1.0 is "definitely true" and 0.0 is 
                              // "definitely false", but you can extend this
                              // to situations where the initial value is not
                              // certain (say, 0.8 = 80% probably true) and
                              // it will work just as well.
for (int opportunities = 0; opportunities < max_opportunities; ++opportunities)
{
    probability_true = probability_true * (1 - flip_chance) + 
                       (1 - probability_true) * flip_chance;
}

Here is that code on ideone (the answer for P=0.2 and B₀=1 and x=8 is B₈=0.508398). As you would expect, given that the value becomes less and less predictable as more and more opportunities pass, the final probability will approach B_x=0.5. You will also observe oscillations between more and less likely to be true, if your chance of flipping is high (for instance, with P=0.8, the beginning of the sequence is B={1.0, 0.2, 0.68, 0.392, 0.46112, ...}.

For a more complete solution that will work for more complicated scenarios, consider using a stochastic matrix (page 7 has an example).