python reinforcement-learning probability-distribution

Choosing a random value from a discrete distribution

I had come across the following code while reading up about RL. The probs vector contains the probabilities of each action to be taken. And I believe the given loop tries to choose an action randomly from the given distribution. Why/How does this work?

  a = 0
  rand_select = np.random.rand()
  while True:
      rand_select -= probs[a]
      if rand_select < 0 or a + 1 == n_actions:
          break
      a += 1
      actions = a

After going through similar code, I realised that "actions" contains the final action to be taken.

Solution

You can view the probabilities as a distribution of contiguous parts on the line from 0.0 to 1.0.

if we have A: 0.2, B: 0.3, C: 0.5 to the line could be

0.0 --A--> 0.2
0.2 --B--> 0.5
0.5 --C--> 1.0

And in total 1.0.

The algorithm is choosing a random location between 0.0->1.0 and finds out where it "landed" (A, B or C) by sequentially ruling out parts.

Suppose we draw 0.73, We can "visualize" it like this (selection marked with *):

0.0 ---------------------------> 1.0
                          *
0.0 --A--> 0.2 --B--> 0.5 --C--> 1.0

0.73 - 0.2 > 0 so we reduce 0.2 (=0.53) and are left with:

0.2 --B--> 0.5
0.5 --C--> 1.0

0.53 - 0.3 > 0 so we reduce 0.5 (=0.23) and are left with:

0.5 --C--> 1.0

0.23 - 0.5 < 0 so we know the part we drew was C.

The selection distributes the same as the probabilities and the algorithm is O(n) where n is the number of probabilities.