Search code examples
rubystatisticsprobabilityprobability-theory

Calculate current probability in respect to prior probabilities to get uniform distribution


Disclaimer: not sure if title of question is accurate...

Let's say I need to wash the dishes within the next 100 minutes. I know I MUST do it after 100 minutes as the latest. I ask myself every 2 minutes "should I do it or not?" (so I ask myself a total of 50 times). Each time I ask myself if I should do it or not, I wanna calculate a probability of me doing it, that will then determine if I do it now or not. The probability should be so that if I repeat the experiment a million times, then I would get an almost even likelihood of me doing it after 2 minutes, 4 minutes, 6 minutes etc. etc. - or in other words a uniform distribution of the probability of doing it.

I have made an experiment where I set the individual probability to check_number / total_number_of_checks However this does not seem to give the right result. Any ideas?

# gem install ascii_charts
require 'ascii_charts'

time_frame = 100
check_frequenzy = 2
number_of_checks = time_frame / check_frequenzy

checks = (1..number_of_checks)

result = {}
checks.each do |i|
  result[i] = 0
end

10_000.times do |t|
  checks.each do |check_number|
    probability = check_number/number_of_checks.to_f
    if rand() <= probability
      # Oh no! I need to do the dishes now :-(
      result[check_number] += 1
      break
    end
  end
end

puts AsciiCharts::Cartesian.new(result.to_a).draw

#=>

950|                                                                                                                                                      
900|                *  *  *                                                                                                                               
850|                         *                                                                                                                            
800|             *                                                                                                                                        
750|                            *                                                                                                                         
700|          *                                                                                                                                           
650|                               *                                                                                                                      
600|       *                          *                                                                                                                   
550|                                                                                                                                                      
500|                                     *                                                                                                                
450|                                                                                                                                                      
400|    *                                                                                                                                                 
350|                                        *                                                                                                             
300|                                           *                                                                                                          
250|                                                                                                                                                      
200| *                                            *                                                                                                       
150|                                                 *                                                                                                    
100|                                                    *                                                                                                 
 50|                                                       *  *  *                                                                                        
  0+----------------------------------------------------------------*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
     1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 

Solution

  • Given N opportunities to accomplish a task, with each opportunity having an equal probability of 1/N, you can pick a random opportunity on-the-fly using the following algorithm. (I don't know Ruby so I used pseudocode.)

    for i from 1 to N
        r = random integer from 0 to N-i inclusive
        if r == 0
            return 2*i  // wash dishes now, at the ith opportunity (2*i minutes)
                        // otherwise continue the loop
    

    For the case of N=50, this means the first opportunity has a probability of p(1) = 1/50. Afterward, the second opportunity has probability p(2) = 1/49. After this, p(3) = 1/48, ans so on until p(50) = 1/1, meaning we must do the task at the last opportunity if we haven't already.


    We can confirm this gives a uniform probability for each opportunity just by multiplying the individual probabilities up until a given point. For example, the chance of taking the 4th opportunity...

    • the probability of not doing the task on the 1st opportunity is 49/50
    • the probability of not doing the task on the 2nd opportunity is 48/49
    • the probability of not doing the task on the 3rd opportunity is 47/48
    • the probability of doing the task on the 4th opportunity is 1/47

    The product would be 49/50 * 48/49 * 47/48 * 1/47 = 1/50, as desired.


    In other words: probability = check_number/number_of_checks.to_f needs to be changed to: probability = 1.0/(number_of_checks - check_number)

    This will render:

    260|                                                                                                                *                                     
    240|                                           *                                                                       *                                  
    220|             *                                *  *              *              *     *              *     *  *        *  *  *              *  *       
    200| *  *  *  *           *  *              *           *  *  *  *     *  *  *  *           *        *     *                       *  *              *    
    180|                *  *        *  *  *  *                                            *        *  *                                      *  *             
    160|                                                                                                                                                      
    140|                                                                                                                                                      
    120|                                                                                                                                                      
    100|                                                                                                                                                      
     80|                                                                                                                                                      
     60|                                                                                                                                                      
     40|                                                                                                                                                      
     20|                                                                                                                                                      
      0+----------------------------------------------------------------------------------------------------------------------------------------------------*-
         1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
    

    If you are especially curious, you can actually generate a uniform distribution even if you don't know number of opportunities N ahead of time. See reservoir sampling for how to do this.