Search code examples
pythonreinforcement-learningprobability-distribution

Choosing a random value from a discrete distribution


I had come across the following code while reading up about RL. The probs vector contains the probabilities of each action to be taken. And I believe the given loop tries to choose an action randomly from the given distribution. Why/How does this work?

  a = 0
  rand_select = np.random.rand()
  while True:
      rand_select -= probs[a]
      if rand_select < 0 or a + 1 == n_actions:
          break
      a += 1
      actions = a

After going through similar code, I realised that "actions" contains the final action to be taken.


Solution

  • You can view the probabilities as a distribution of contiguous parts on the line from 0.0 to 1.0.

    if we have A: 0.2, B: 0.3, C: 0.5 to the line could be

    0.0 --A--> 0.2
    0.2 --B--> 0.5
    0.5 --C--> 1.0
    

    And in total 1.0.

    The algorithm is choosing a random location between 0.0->1.0 and finds out where it "landed" (A, B or C) by sequentially ruling out parts.

    Suppose we draw 0.73, We can "visualize" it like this (selection marked with *):

    0.0 ---------------------------> 1.0
                              *
    0.0 --A--> 0.2 --B--> 0.5 --C--> 1.0
    

    0.73 - 0.2 > 0 so we reduce 0.2 (=0.53) and are left with:

    0.2 --B--> 0.5
    0.5 --C--> 1.0
    

    0.53 - 0.3 > 0 so we reduce 0.5 (=0.23) and are left with:

    0.5 --C--> 1.0
    

    0.23 - 0.5 < 0 so we know the part we drew was C.

    The selection distributes the same as the probabilities and the algorithm is O(n) where n is the number of probabilities.