Looking for general principles for debugging randomized programs, as well as any specific guidelines for doing it in Python.
As example consider the following implementation of insertion in Skip Lists:
#inserts key into the lowest level list and then promotes it upwards based on coin flips
def insert(self,key):
new_node = SkipListNode()
new_node.val = key
#insert in order in the lowest list
self.orderedInsert(new_node,self.search(key,True))
flip = random.randint(0,1)
level = 0
level_node= new_node
#promote upwards based on coin flips
while flip == 1:
level_up_node = SkipListNode()
level_up_node.val = key
#see if an upper level exists, if not create it
if(len(self.lists)-1<=level):
...
#upper level exists, move back find the first element with an up
#insert new node as it's next
else:
...
...
level +=1
flip = random.randint(0,1)
Here insert
depends on random coin flips. So a bug in that function becomes hard to detect because it may or may not appear in each run. How do we simplify debugging in this case?
As a general answer to the question "How do I debug a randomized algorithm":
First, you have to realize that the random number generator used by the random
module is a pseudo-random-number generator. It initializes itself with a seed (a starting number) and then uses that to deterministically generate a seemingly random sequence. If it were seeded again with the same initial number it would generate the same sequence.
With this knowledge we can look at the documentation for random
and see that it does let you pick the seed yourself. It also tells you what it normally uses for the seed: https://docs.python.org/2/library/random.html#random.seed
So this gives us this code which you can put in your randomized algorithm:
import os
import time
import random
def init_rand(seed=None):
if seed is None:
try:
seed = os.urandom(8)
except NotImplementedError:
seed = time.time()
print 'seed: %s' % seed
random.seed(seed)
Now if you always call init_rand
before running your randomized algorithm, by default your algorithm will work the same as before except it will also tell you which seed it used. If you ever have a bug which you need to debug then you can just call init_rand
using the seed which generated the bug and you will get the exact same behavior, giving you 100% reproducibility.