Search code examples
pythonmultithreadingpython-requestsurllib2urlopen

urlopen/requests.get not working in threads created in imported modules


I have a problem with urlopen (and requests.get)

In my program, if I run it inside a thread (I tested with multiprocessing too) [update: a thread that has been created by an imported module] it won't run until the program ends.

By "won't run" I mean not even start: the timeout (here 3 seconds) will never fire, and there is no connection made to the website.

Here is my simplified code:

import threading,urllib2,time

def dlfile(url):
  print 'Before request'
  r = urllib2.urlopen(url, timeout=3)
  print 'After request'
  return r

def dlfiles(*urls):
  threads = [threading.Thread(None, dlfile, None, (url,), {}) for url in urls]
  map(lambda t:t.start(), threads)

def main():
    dlfiles('http://google.com')

main()
time.sleep(10)
print 'End of program'

My output:

Before request
End of program
After request

Unfortunately, the code I'm writing on SO works as expected (i.e. "Before request/After request/End of program") and I can't reproduce the problem with simplified code yet.

I'm still trying to but in the mean time I'd like to know if anyone ever encountered that weird behaviour and what could cause it. Note that if I don't use a thread everything's fine.

Thanks for any help you can provide, I'm kind of lost and even the interwebs have no idea about this

UPDATE

Here is how to reproduce the behaviour

threadtest.py

import threading,urllib2,time
def log(a):print(a)
def dlfile(url):
  log('Before request')
  r = urllib2.urlopen(url, timeout=3)
  log('After request')
  return r

def dlfiles(*urls):
  threads = [threading.Thread(None, dlfile, None, (url,), {}) for url in urls]
  map(lambda t:t.start(), threads)

def main():
    dlfiles('http://google.com')

main()
for i in range(5):
    time.sleep(1)
    log('Sleep')
log('End of program')

threadtest-import.py

import threadtest

Then the outputs will be this:

$ python threadtest.py
Before request
After request
Sleep
Sleep
Sleep
Sleep
Sleep
End of program

$ python threadtest-import.py
Before request
Sleep
Sleep
Sleep
Sleep
Sleep
End of program
After request

Now that I found how to reproduce: is this behaviour normal? expected?

And how can I get rid of it? I.e. creating from an imported module a thread that can make urlopen load as expected.


Solution

  • I forgot to post the solution, thanks to @user3351750 for his comment.

    The problem is the structure of the files. In threadtest-import.py I import threadtest and during the time the module is imported, something* (I don't remember the exact mechanism) becomes blocking. IIRC this has to do with the re module in urllib. Sorry for not being clear.

    The fix is to put your code in the imported module inside a function. This is good practice for a reason I guess.

    I.e. do this:

    import threadtest #do nothing except declarations
    threadtest.run() #do the work
    

    Instead of this:

    import threadtest #declarations + work
    

    And put the code

    main()
    for i in range(5):
        time.sleep(1)
        log('Sleep')
    log('End of program')
    

    Inside the run function:

    def run():
        main()
        for i in range(5):
            time.sleep(1)
            log('Sleep')
        log('End of program')
    

    This way the thing* stops being blocking and everything works as expected.