Search code examples
pythonmultithreadingcontrol-flow

Flow control in threading.Thread


I Have run into a few examples of managing threads with the threading module (using Python 2.6).

What I am trying to understand is how is this example calling the "run" method and where. I do not see it anywhere. The ThreadUrl class gets instantiated in the main() function as "t" and this is where I would normally expect the code to start the "run" method.

Maybe this is not the preferred way of working with threads? Please enlighten me:

#!/usr/bin/env python

import Queue
import time
import urllib2
import threading
import datetime

hosts = ["http://example.com/", "http://www.google.com"]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):
    """Threaded Url Grab"""
    def __init__(self, queue):
            threading.Thread.__init__(self)
            self.queue = queue

    def run(self):
            while True:
                    #grabs host from queue
                    host = self.queue.get()

                    #grabs urls of hosts and prints first 1024 bytes of page
                    url = urllib2.urlopen(host)
                    print url.read(10)

                    #signals to queue job is done
                    self.queue.task_done()

start = time.time()

def main():

    #spawn a pool of threads, and pass them queue instance
    for i in range(1):
            t = ThreadUrl(queue)
            t.setDaemon(True)
            t.start()

            for host in hosts:
                    queue.put(host)

    queue.join()
main()
print "Elapsed time: %s" % (time.time() - start)

Solution

  • Per the pydoc:

    Thread.start()

    Start the thread’s activity.

    It must be called at most once per thread object. It arranges for the object’s run() method to be invoked in a separate thread of control.

    This method will raise a RuntimeException if called more than once on the same thread object.

    The way to think of python Thread objects is that they take some chunk of python code that is written synchronously (either in the run method or via the target argument) and wrap it up in C code that knows how to make it run asynchronously. The beauty of this is that you get to treat start like an opaque method: you don't have any business overriding it unless you're rewriting the class in C, but you get to treat run very concretely. This can be useful if, for example, you want to test your thread's logic synchronously. All you need is to call t.run() and it will execute just as any other method would.