Search code examples
pythonmultithreadingprocessmulticore

Am I doing multi core programming the right way here


I want to have some daemon that finds images that I need to convert into web and thumb versions. I thought python could be useful here, but I'm not sure if I'm doing things right here. I want to convert 8 photos simultaneously, the queue of images to be converted can be very long. We have several cores on the server and spawning each convert in a new process should let the OS take use of the available cores and things will go faster, right? This is the key point here, to make a process from python that again calls imagemagick's convert script and hope that things go a bit faster than running one and one convert from the python main thread.

So far I only started testing. So here is my test code. It will create 20 tasks (which is to sleep between 1 and 5 seconds), and give those tasks to a pool that in total has 5 threads.

from multiprocessing import Process
from subprocess import call
from random import randrange
from threading import Thread
from Queue import Queue

class Worker(Thread):
    def __init__(self, tid, queue):
        Thread.__init__(self)
        self.tid = tid
        self.queue = queue
        self.daemon = True
        self.start()

    def run(self):
        while True:
            sec = self.queue.get()
            print "Thread %d sleeping for %d seconds\n\n" % (self.tid, sec)
            p = Process(target=work, args=(sec,))
            p.start()
            p.join()
            self.queue.task_done()

class WorkerPool:
    def __init__(self, num_workers):
        self.queue = Queue()
        for tid in range(num_workers):
            Worker(tid, self.queue)

    def add_task(self, sec):
        self.queue.put(sec)

    def complete_work(self):
        self.queue.join()

def work(sec):
    call(["sleep", str(sec)])

def main():
    seconds = [randrange(1, 5) for i in range(20)]
    pool = WorkerPool(5)
    for sec in seconds:
        pool.add_task(sec)
    pool.complete_work()

if __name__ == '__main__':
    main()

So I run this script on the server:

johanhar@mamadev:~$ python pythonprocesstest.py

And then I check my processes on the server:

johanhar@mamadev:~$ ps -fux

The result from ps looks wrong to me. To me it looks as if I have something happening under python but in one process, so it will only go slower the more converts (or sleep as in this test case) I have even if we have several cores on the server...

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
johanhar 24246  0.0  0.0  81688  1608 ?        S    13:44   0:00 sshd: johanhar@pts/28
johanhar 24247  0.0  0.0 108336  1832 pts/28   Ss   13:44   0:00  \_ -bash
johanhar 49753  0.6  0.0 530620  7512 pts/28   Sl+  15:14   0:00      \_ python pythonprocesstest.py
johanhar 49822  0.0  0.0 530620  6252 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49824  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 4
johanhar 49823  0.0  0.0 530620  6256 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49826  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 3
johanhar 49837  0.0  0.0 530620  6264 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49838  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 3
johanhar 49846  0.0  0.0 530620  6264 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49847  0.0  0.0 100904   564 pts/28   S+   15:14   0:00              \_ sleep 3

So if you still don't get the problem or what I'm asking for. Is this approach what you could call "multi core programming"?


Solution

  • I think you are misreading the ps output. I count 4 distinct Python instances, each which could, in principle, be allocated to its own core. Whether they actually do get their own core is one of the harder bits of multi-processing.

    Yes, there is the superior Python process (PID 49753) which is parent to the sub-processes, but there is also a bash which is parent to that in an analogous way.