What are the advantages of multithreaded programming in Python?

When I hear about multithreaded programming, I think about the opportunity to accelerate my program, but it is not?

import eventlet
from eventlet.green import socket
from iptools import IpRangeList


class Scanner(object):
    def __init__(self, ip_range, port_range, workers_num):
        self.workers_num = workers_num or 1000
        self.ip_range = self._get_ip_range(ip_range)
        self.port_range = self._get_port_range(port_range)
        self.scaned_range = self._get_scaned_range()

    def _get_ip_range(self, ip_range):
        return [ip for ip in IpRangeList(ip_range)]

    def _get_port_range(self, port_range):
        return [r for r in range(*port_range)]

    def _get_scaned_range(self):
        for ip in self.ip_range:
            for port in self.port_range:
                yield (ip, port)

    def scan(self, address):
        try:
            return bool(socket.create_connection(address))
        except:
            return False

    def run(self):
        pool = eventlet.GreenPool(self.workers_num)
        for status in pool.imap(self.scan, self.scaned_range):
            if status:
                yield True

    def run_std(self):
        for status in map(self.scan, self.scaned_range):
            if status:
                yield True


if __name__ == '__main__':
    s = Scanner(('127.0.0.1'), (1, 65000), 100000)
    import time
    now = time.time()
    open_ports = [i for i in s.run()]
    print 'Eventlet time: %s (sec) open: %s' % (now - time.time(),
                                                len(open_ports))
    del s
    s = Scanner(('127.0.0.1'), (1, 65000), 100000)
    now = time.time()
    open_ports = [i for i in s.run()]
    print 'CPython time: %s (sec) open: %s' % (now - time.time(),
                                                len(open_ports))

and results:

Eventlet time: -4.40343403816 (sec) open: 2
CPython time: -4.48356699944 (sec) open: 2

And my question is, if I run this code is not on my laptop but on the server and set more value of workers it will run faster than the CPython's version? What are the advantages of threads?

ADD: And so I rewrite app with use original cpython's threads

import socket
from threading import Thread
from Queue import Queue

from iptools import IpRangeList

class Scanner(object):
    def __init__(self, ip_range, port_range, workers_num):
        self.workers_num = workers_num or 1000
        self.ip_range = self._get_ip_range(ip_range)
        self.port_range = self._get_port_range(port_range)
        self.scaned_range = [i for i in self._get_scaned_range()]

    def _get_ip_range(self, ip_range):
        return [ip for ip in IpRangeList(ip_range)]

    def _get_port_range(self, port_range):
        return [r for r in range(*port_range)]

    def _get_scaned_range(self):
        for ip in self.ip_range:
            for port in self.port_range:
                yield (ip, port)

    def scan(self, q):
        while True:
            try:
                r = bool(socket.create_conection(q.get()))
            except Exception:
                r = False
            q.task_done()

    def run(self):
        queue = Queue()
        for address in self.scaned_range:
                queue.put(address)
        for i in range(self.workers_num):
                worker = Thread(target=self.scan,args=(queue,))
                worker.setDaemon(True)
                worker.start()
        queue.join()


if __name__ == '__main__':
    s = Scanner(('127.0.0.1'), (1, 65000), 5)
    import time
    now = time.time()
    s.run()
    print time.time() - now

and result is

 Cpython's thread: 1.4 sec

And I think this is a very good result. I take as a standard nmap scanning time:

$ nmap 127.0.0.1 -p1-65000

Starting Nmap 5.21 ( http://nmap.org ) at 2012-10-22 18:43 MSK
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00021s latency).
Not shown: 64986 closed ports
PORT      STATE SERVICE
53/tcp    open  domain
80/tcp    open  http
443/tcp   open  https
631/tcp   open  ipp
3306/tcp  open  mysql
6379/tcp  open  unknown
8000/tcp  open  http-alt
8020/tcp  open  unknown
8888/tcp  open  sun-answerbook
9980/tcp  open  unknown
27017/tcp open  unknown
27634/tcp open  unknown
28017/tcp open  unknown
39900/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.85 seconds

And my question is now: how threads implemented in Eventlet as I can understand this is not threads but something special for Eventlet and why they dont speed up tasks?

Eventlet is used by many of the major projects like OpenStack and etc. But why? Just do the heavy queries to a DB in asynchronous manner or something else?

Solution

Cpython threads:

Each cpython thread maps to an OS level thread (lightweight process/pthread in user space)
If there are many cpython threads executing python code concurrently: due to the global interpreter lock, only one cpython thread can interpret python at one time. The remaining threads will be blocked on the GIL when they need to interpret python instructions. When there are many python threads this slows things down a lot.
Now if your python code is spending most of its time inside networking operations (send, connect, etc): in this case there will be less threads fighting for GIL to interpret code. So the effect of GIL is not so bad.

Eventlet/Green threads:

From above we know that cpython has a performance limitation with threads. Eventlets tries to solve the problem by using a single thread running on a single core and using non blocking i/o for everything.
Green threads are not real OS level threads. They are a user space abstraction for concurrency. Most importantly, N green threads will map to 1 OS thread. This avoids the GIL problem.
Green threads cooperatively yield to each other instead of preemptively being scheduled. For networking operations, the socket libraries are patched in run time (monkey patching) so that all calls are non-blocking.
So even when you create a pool of eventlet green threads, you are actually creating only one OS level thread. This single OS level thread will execute all the eventlets. The idea is that if all the networking calls are non blocking, this should be faster than python threads, in some cases.

Summary

For your program above, "true" concurrency happens to be faster (cpython version, 5 threads running on multiple processors ) than the eventlet model (single thread running on 1 processor.).

There are some cpython workloads that will perform badly on many threads/cores (e.g. if you have 100 clients connecting to a server, and one thread per client). Eventlet is an elegant programming model for such workloads, so its used in several places.