When I hear about multithreaded programming, I think about the opportunity to accelerate my program, but it is not?
import eventlet
from eventlet.green import socket
from iptools import IpRangeList
class Scanner(object):
def __init__(self, ip_range, port_range, workers_num):
self.workers_num = workers_num or 1000
self.ip_range = self._get_ip_range(ip_range)
self.port_range = self._get_port_range(port_range)
self.scaned_range = self._get_scaned_range()
def _get_ip_range(self, ip_range):
return [ip for ip in IpRangeList(ip_range)]
def _get_port_range(self, port_range):
return [r for r in range(*port_range)]
def _get_scaned_range(self):
for ip in self.ip_range:
for port in self.port_range:
yield (ip, port)
def scan(self, address):
try:
return bool(socket.create_connection(address))
except:
return False
def run(self):
pool = eventlet.GreenPool(self.workers_num)
for status in pool.imap(self.scan, self.scaned_range):
if status:
yield True
def run_std(self):
for status in map(self.scan, self.scaned_range):
if status:
yield True
if __name__ == '__main__':
s = Scanner(('127.0.0.1'), (1, 65000), 100000)
import time
now = time.time()
open_ports = [i for i in s.run()]
print 'Eventlet time: %s (sec) open: %s' % (now - time.time(),
len(open_ports))
del s
s = Scanner(('127.0.0.1'), (1, 65000), 100000)
now = time.time()
open_ports = [i for i in s.run()]
print 'CPython time: %s (sec) open: %s' % (now - time.time(),
len(open_ports))
and results:
Eventlet time: -4.40343403816 (sec) open: 2
CPython time: -4.48356699944 (sec) open: 2
And my question is, if I run this code is not on my laptop but on the server and set more value of workers it will run faster than the CPython's version? What are the advantages of threads?
ADD: And so I rewrite app with use original cpython's threads
import socket
from threading import Thread
from Queue import Queue
from iptools import IpRangeList
class Scanner(object):
def __init__(self, ip_range, port_range, workers_num):
self.workers_num = workers_num or 1000
self.ip_range = self._get_ip_range(ip_range)
self.port_range = self._get_port_range(port_range)
self.scaned_range = [i for i in self._get_scaned_range()]
def _get_ip_range(self, ip_range):
return [ip for ip in IpRangeList(ip_range)]
def _get_port_range(self, port_range):
return [r for r in range(*port_range)]
def _get_scaned_range(self):
for ip in self.ip_range:
for port in self.port_range:
yield (ip, port)
def scan(self, q):
while True:
try:
r = bool(socket.create_conection(q.get()))
except Exception:
r = False
q.task_done()
def run(self):
queue = Queue()
for address in self.scaned_range:
queue.put(address)
for i in range(self.workers_num):
worker = Thread(target=self.scan,args=(queue,))
worker.setDaemon(True)
worker.start()
queue.join()
if __name__ == '__main__':
s = Scanner(('127.0.0.1'), (1, 65000), 5)
import time
now = time.time()
s.run()
print time.time() - now
and result is
Cpython's thread: 1.4 sec
And I think this is a very good result. I take as a standard nmap scanning time:
$ nmap 127.0.0.1 -p1-65000
Starting Nmap 5.21 ( http://nmap.org ) at 2012-10-22 18:43 MSK
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00021s latency).
Not shown: 64986 closed ports
PORT STATE SERVICE
53/tcp open domain
80/tcp open http
443/tcp open https
631/tcp open ipp
3306/tcp open mysql
6379/tcp open unknown
8000/tcp open http-alt
8020/tcp open unknown
8888/tcp open sun-answerbook
9980/tcp open unknown
27017/tcp open unknown
27634/tcp open unknown
28017/tcp open unknown
39900/tcp open unknown
Nmap done: 1 IP address (1 host up) scanned in 0.85 seconds
And my question is now: how threads implemented in Eventlet as I can understand this is not threads but something special for Eventlet and why they dont speed up tasks?
Eventlet is used by many of the major projects like OpenStack and etc. But why? Just do the heavy queries to a DB in asynchronous manner or something else?
Cpython threads:
Each cpython thread maps to an OS level thread (lightweight process/pthread in user space)
If there are many cpython threads executing python code concurrently: due to the global interpreter lock, only one cpython thread can interpret python at one time. The remaining threads will be blocked on the GIL when they need to interpret python instructions. When there are many python threads this slows things down a lot.
Now if your python code is spending most of its time inside networking operations (send, connect, etc): in this case there will be less threads fighting for GIL to interpret code. So the effect of GIL is not so bad.
Eventlet/Green threads:
From above we know that cpython has a performance limitation with threads. Eventlets tries to solve the problem by using a single thread running on a single core and using non blocking i/o for everything.
Green threads are not real OS level threads. They are a user space abstraction for concurrency. Most importantly, N green threads will map to 1 OS thread. This avoids the GIL problem.
Green threads cooperatively yield to each other instead of preemptively being scheduled. For networking operations, the socket libraries are patched in run time (monkey patching) so that all calls are non-blocking.
So even when you create a pool of eventlet green threads, you are actually creating only one OS level thread. This single OS level thread will execute all the eventlets. The idea is that if all the networking calls are non blocking, this should be faster than python threads, in some cases.
Summary
For your program above, "true" concurrency happens to be faster (cpython version, 5 threads running on multiple processors ) than the eventlet model (single thread running on 1 processor.).
There are some cpython workloads that will perform badly on many threads/cores (e.g. if you have 100 clients connecting to a server, and one thread per client). Eventlet is an elegant programming model for such workloads, so its used in several places.