Search code examples
pythonmultithreadingperformancegil

Python cost of locking vs. performance, (does multithreading make sense?)


I'm working on a project where throughput of my code quite important and after some consideration I choose to make my program threaded.

The main thread and the subthread both adds and removes from two shared dictionaries. I've been looking through the interwebs about some input considering the performance of locking in python, is it a slow operation, etc.

So what I'm getting at since python actually isn't actually threaded at all (thinking of the GIL only working on one core) if I need high performance in my application do I have anything to win by making it threaded except for handling IO?

EDIT

The actual question is (after a insightful comment)

Does multithreading make sense in python, since there's GIL?


Solution

  • First of all, locking in any language is a performance bottleneck. Minimize locking where possible; don't use shared directories for example, create a tree instead and have each thread work in a different branch of that tree.

    Since you'll be doing a lot of I/O, your performance problems will lie there and threading is not necessarily going to improve matters. Look into event-driven architectures first:

    The GIL is not likely to be your problem here; it'll be released whenever a thread enters C code, for example (almost certainly during any I/O call). If it ever does become a bottleneck, move to multiple processes. On a large intranet cluster I administer, for example, we run 6 processes of each 2 threads to make full use of all the CPU cores (2 of the processes carry a very light load).

    If you feel you need multiple processes, either use the multiprocessing module or make it easy to start multiple instances of your server (each listening on a different port) and use a load balancer such as haproxy to direct traffic to each server.