Represent state space graph for Markov process for car racing example

Could anybody please help me with designing state space graph for Markov Decision process of car racing example from Berkeley CS188.

car racing example

For example I can do 100 actions and I want to run value iteration to get best policy to maximize my rewards.

When I have only 3 states (Cool, Warm and Overheated) I don't know how to add "End" state and complete MDP.

I am thinking about having 100 Cool states and 100 Warm states, and for example from Cool1 you can go to Cool2, Warm2 or Overheated and so on. In this example my values of states close to 0 are higher than states closed to 100.

Am I missing something in MDP?

Solution

There should only be 3 possible states. "Cool" and "warm" states are recurrent, and "overheated" state is absorbing because the probability of leaving the state is 0.

You can have two actions, slow or fast, for both"cool" and "warm" states, as described in the problem statement. The probability transition matrix and step rewards can be easily established from the chart. For example, P(go fast, from cool to warm) = 0.5, and R(go fast, from cool to warm) = 2.

Depending on the objective, you can solve it as a finite horizon or infinite horizon MDP.