Search code examples
c++pseudocodemarkov

Converting Pseudocode to C++


I am trying to learn about Markov decision problems and I was given the algorithm for Value Iteration, but I am confused how to turn them into actual C++ code. Mainly the parts where summations and such occur. Here is the algorithm:

function VALUE-ITERATION(P;R) returns a utility matrix
    inputs: P, a transition-probability matrix
            R, a reward matrix
    local variables: U, utility matrix, initially identical to R
                     U', utility matrix, initially identical toR
    repeat
         U <- U'
         for each state i do
             U'(s_i) <-  R(s_i) + max_a Summation_j P^a_ij*U(s_j)
         end
    until max_(s_i) |U(s_i) - U'(s_i)| < e
return U

This looks like hieroglyphics to me, is there a simpler algorithm that would be of more help to me? Or could somebody dumb it down for me?


Solution

  • I found this article quite readily: Value iteration and policy iteration algorithms for Markov decision problem [PDF file]. It explains quite a bit more what's going on.

    Conceptually, you have a system that can be in a number of states, rewards for transitions from one state to another, and actions that sometimes can result in state transitions. The basic idea is to keep iterating until you have arrived at a utility matrix that doesn't change That's what that final test max_(s_i) | U(s_i) - U'(s_i)| < e looks for. (Here, e is short for epsilon, a small number, and probably should be an additional input.)

    For each iteration, you want to take the best action for each state. The best action is the one that yields the greatest reward, weighted by probability. That's what max_a Summation_j P^a_ij*U(s_j) does: Find the action that yields the best reward, weighted by probability.