**** hi,all
i am using eventlet to implement a web crawler. My code is just like this
import eventlet
urls = [
"http://deeplearning.stanford.edu/wiki/index.php/UFLDL%E6%95%99%E7%A8%8B",
"http://www.google.com.hk",
"http://www.baidu.com",
]
def fetch(url):
print('entering fetch')
body=urllib2.urlopen(url).read()
print('body')
pool = eventlet.GreenPool(100)
for url in urls:
pool.spawn(fetch,url)
time.sleep(10)
but it outputs nothing and it seems that fetch does not run at all
BTW,pool.imap does works
what happened?
what i want to do is that:the urls are coming streamly ,i.e. one by one. just like this
While(True):
url=getOneUrl() #get one url streamly
pool.spawn(fetch,url) #fetch the url
but it does not work,either.
thanks in advance....
According to the eventlet implementation, the pool.imap code will wait until all greenthreads in the pool finish working, but pool.spawn won't and ends immediately.
You can try appending some waiting or sleeping at the end of you script. Then those spawned greenthreads will executed your function.
pool.waitall()
or
eventlet.sleep(10)
Actually, in 'for body in pool.imap(fetch, urls)', it calls pool.imap and iterate the results. The invoke of pool.imap doesn't call waiting functions, but the iterating does.
Try doing it without iteration of the result. Without iteration, it ends immediately as pool.spawn.
pool = eventlet.GreenPool(100)
pool.imap(fetch, urls)
If you want to know more about this, just check the code in greenpool.py.
There is only one thread running for all green threads. Try this on all green threads, you will get a unique thread id.
print greenthread.getcurrent(), threading.current_thread()
If looping without eventlet.sleep, the thread is blocked all the time. Other green threads have no chance to be scheduled. So one possible solution for your problem is calling eventlet.sleep after invoking spawn in your while loop.