Search code examples
pythonalgorithmdebuggingnon-deterministicskip-lists

How to reproducibly debug a program dependent on a randomized algorithm?


Looking for general principles for debugging randomized programs, as well as any specific guidelines for doing it in Python.

As example consider the following implementation of insertion in Skip Lists:

#inserts key into the lowest level list and then promotes it upwards based on coin flips
    def insert(self,key):
        new_node = SkipListNode()
        new_node.val = key
        #insert in order in the lowest list
        self.orderedInsert(new_node,self.search(key,True))
        flip = random.randint(0,1)
        level = 0
        level_node= new_node
        #promote upwards based on coin flips
        while flip == 1:
            level_up_node = SkipListNode()
            level_up_node.val = key
            #see if an upper level exists, if not create it
            if(len(self.lists)-1<=level):
                ...
            #upper level exists, move back find the first element with an up
            #insert new node as it's next
            else:
                ...
            ...
            level +=1
            flip = random.randint(0,1)

Here insert depends on random coin flips. So a bug in that function becomes hard to detect because it may or may not appear in each run. How do we simplify debugging in this case?


Solution

  • As a general answer to the question "How do I debug a randomized algorithm":

    First, you have to realize that the random number generator used by the random module is a pseudo-random-number generator. It initializes itself with a seed (a starting number) and then uses that to deterministically generate a seemingly random sequence. If it were seeded again with the same initial number it would generate the same sequence.

    With this knowledge we can look at the documentation for random and see that it does let you pick the seed yourself. It also tells you what it normally uses for the seed: https://docs.python.org/2/library/random.html#random.seed

    So this gives us this code which you can put in your randomized algorithm:

    import os
    import time
    import random
    def init_rand(seed=None):
        if seed is None:
            try:
                seed = os.urandom(8)
            except NotImplementedError:
                seed = time.time()
        print 'seed: %s' % seed
        random.seed(seed)
    

    Now if you always call init_rand before running your randomized algorithm, by default your algorithm will work the same as before except it will also tell you which seed it used. If you ever have a bug which you need to debug then you can just call init_rand using the seed which generated the bug and you will get the exact same behavior, giving you 100% reproducibility.