Search code examples
arraysjuliareinforcement-learningprobability-theory

Julia way to write k-step look ahead function?


Suppose I have two arrays representing a probabilistic graph:

      2
     / \
1 ->     4 -> 5 -> 6 -> 7
     \ /
      3  

Where the probability of going to state 2 is 0.81 and the probability of going to state 3 is (1-0.81) = 0.19. My arrays represent the estimated values of the states as well as the rewards. (Note: Each index of the array represents its respective state)

V = [0, 3, 8, 2, 1, 2, 0]
R = [0, 0, 0, 4, 1, 1, 1]

The context doesn't matter so much, it's just to give an idea of where I'm coming from. I need to write a k-step look ahead function where I sum the discounted value of rewards and add it to the estimated value of the kth-state.

I have been able to do this so far by creating separate functions for each step look ahead. My goal of asking this question is to figure out how to refactor this code so that I don't repeat myself and use idiomatic Julia.

Here is an example of what I am talking about:

function E₁(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
    V[1] + 0.81*(R[1] + V[2]) + 0.19*(R[2] + V[3])
end

function E₂(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
    V[1] + 0.81*(R[1] + R[3]) + 0.19*(R[2] + R[4]) + V[4]
end

function E₃(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
    V[1] + 0.81*(R[1] + R[3]) + 0.19*(R[2] + R[4]) + R[5] + V[5]
end

.
.
.

So on and so forth. It seems that if I was to ignore E₁() this would be exceptionally easy to refactor. But because I have to discount the value estimate at two different states, I'm having trouble thinking of a way to generalize this for k-steps.

I think obviously I could write a single function that took an integer as a value and then use a bunch of if-statements but that doesn't seem in the spirit of Julia. Any ideas on how I could refactor this? A closure of some sort? A different data type to store R and V?


Solution

  • It seems like you essentially have a discrete Markov chain. So the standard way would be to store the graph as its transition matrix:

    T = zeros(7,7)
    T[1,2] = 0.81
    T[1,3] = 0.19
    T[2,4] = 1
    T[3,4] = 1
    T[5,4] = 1
    T[5,6] = 1
    T[6,7] = 1
    

    Then you can calculate the probabilities of ending up at each state, given an intial distribution, by multiplying T' from the left (because usually, the transition matrix is defined transposedly):

    julia> T' * [1,0,0,0,0,0,0] # starting from (1)
    7-element Array{Float64,1}:
     0.0 
     0.81
     0.19
     0.0 
     0.0 
     0.0 
     0.0 
    

    Likewise, the probability of ending up at each state after k steps can be calculated by using powers of T':

    julia> T' * T' * [1,0,0,0,0,0,0]
    7-element Array{Float64,1}:
     0.0
     0.0
     0.0
     1.0
     0.0
     0.0
     0.0
    

    Now that you have all probabilities after k steps, you can easily calculate expectations as well. Maybe it pays of to define T as a sparse matrix.