I have a problem with urlopen (and requests.get)
In my program, if I run it inside a thread (I tested with multiprocessing
too) [update: a thread that has been created by an imported module] it won't run until the program ends.
By "won't run" I mean not even start: the timeout (here 3 seconds) will never fire, and there is no connection made to the website.
Here is my simplified code:
import threading,urllib2,time
def dlfile(url):
print 'Before request'
r = urllib2.urlopen(url, timeout=3)
print 'After request'
return r
def dlfiles(*urls):
threads = [threading.Thread(None, dlfile, None, (url,), {}) for url in urls]
map(lambda t:t.start(), threads)
def main():
dlfiles('http://google.com')
main()
time.sleep(10)
print 'End of program'
My output:
Before request
End of program
After request
Unfortunately, the code I'm writing on SO works as expected (i.e. "Before request/After request/End of program") and I can't reproduce the problem with simplified code yet.
I'm still trying to but in the mean time I'd like to know if anyone ever encountered that weird behaviour and what could cause it. Note that if I don't use a thread everything's fine.
Thanks for any help you can provide, I'm kind of lost and even the interwebs have no idea about this
Here is how to reproduce the behaviour
threadtest.py
import threading,urllib2,time
def log(a):print(a)
def dlfile(url):
log('Before request')
r = urllib2.urlopen(url, timeout=3)
log('After request')
return r
def dlfiles(*urls):
threads = [threading.Thread(None, dlfile, None, (url,), {}) for url in urls]
map(lambda t:t.start(), threads)
def main():
dlfiles('http://google.com')
main()
for i in range(5):
time.sleep(1)
log('Sleep')
log('End of program')
threadtest-import.py
import threadtest
Then the outputs will be this:
$ python threadtest.py
Before request
After request
Sleep
Sleep
Sleep
Sleep
Sleep
End of program
$ python threadtest-import.py
Before request
Sleep
Sleep
Sleep
Sleep
Sleep
End of program
After request
Now that I found how to reproduce: is this behaviour normal? expected?
And how can I get rid of it? I.e. creating from an imported module a thread that can make urlopen load as expected.
I forgot to post the solution, thanks to @user3351750 for his comment.
The problem is the structure of the files. In threadtest-import.py I import threadtest and during the time the module is imported, something* (I don't remember the exact mechanism) becomes blocking. IIRC this has to do with the re module in urllib. Sorry for not being clear.
The fix is to put your code in the imported module inside a function. This is good practice for a reason I guess.
I.e. do this:
import threadtest #do nothing except declarations
threadtest.run() #do the work
Instead of this:
import threadtest #declarations + work
And put the code
main()
for i in range(5):
time.sleep(1)
log('Sleep')
log('End of program')
Inside the run
function:
def run():
main()
for i in range(5):
time.sleep(1)
log('Sleep')
log('End of program')
This way the thing* stops being blocking and everything works as expected.