Search code examples
machine-learningreinforcement-learningq-learning

Boltzman exploration with more than two actions in Q-learning


I am using Boltzman exploration in Q-learning where I have at least 10 actions in each state. I know that with only two actions, Boltzman exploration can be applied quite simply as follows:

  1. Calculate pr1 and pr2 for the two actions with the Boltzman exploration equation.
  2. Generate a random number r
  3. Assuming pr1>pr2. If r<=pr1, take action corresponding to probability pr1. If r>pr1, take action corresponding to pr2.

However, how can I do this with 10 actions? At each decision step, I update the probabilities of all the actions. This gives me a probability distribution of all the actions where the probability of best action is highest. How do I select action in this case using the Boltzman exploration?


Solution

  • Here is an excellent discussion of weighted random sampling: Darts, Dice, and Coins.

    And here is my implementation of the Vose's Alias method:

    class WeightedRandom
    {
        private alias : array[int];
        private prob  : array[double];
    
        private random : Random;
    
        public this(p : array[double], random : Random)
        {
            this.random = random;
    
            def n = p.Length;
    
            alias = array(n);
            prob  = array(n);
    
            def small = Queue(n);
            def large = Queue(n);
    
            def p = p.Map(_ * n : double);
    
            foreach (x in p with i)
                (if (x < 1.0) small else large).Enqueue(i);
    
            while (!small.IsEmpty && !large.IsEmpty)
            {
                def s = small.Dequeue();
                def l = large.Dequeue();
                prob[s]  = p[s];
                alias[s] = l;
                p[l] = p[l] + p[s] - 1;
                (if (p[l] < 1.0) small else large).Enqueue(l);
            }
    
            while (!large.IsEmpty)
                prob[large.Dequeue()] = 1.0;
    
            while (!small.IsEmpty)
                prob[small.Dequeue()] = 1.0;
        }
    
        public NextIndex() : int
        {
            def i = random.Next(prob.Length);
            if (random.NextDouble() < prob[i])
                i;
            else
                alias[i];
        }
    }