I want to have some daemon that finds images that I need to convert into web and thumb versions. I thought python could be useful here, but I'm not sure if I'm doing things right here. I want to convert 8 photos simultaneously, the queue of images to be converted can be very long. We have several cores on the server and spawning each convert in a new process should let the OS take use of the available cores and things will go faster, right? This is the key point here, to make a process from python that again calls imagemagick's convert script and hope that things go a bit faster than running one and one convert from the python main thread.
So far I only started testing. So here is my test code. It will create 20 tasks (which is to sleep between 1 and 5 seconds), and give those tasks to a pool that in total has 5 threads.
from multiprocessing import Process
from subprocess import call
from random import randrange
from threading import Thread
from Queue import Queue
class Worker(Thread):
def __init__(self, tid, queue):
Thread.__init__(self)
self.tid = tid
self.queue = queue
self.daemon = True
self.start()
def run(self):
while True:
sec = self.queue.get()
print "Thread %d sleeping for %d seconds\n\n" % (self.tid, sec)
p = Process(target=work, args=(sec,))
p.start()
p.join()
self.queue.task_done()
class WorkerPool:
def __init__(self, num_workers):
self.queue = Queue()
for tid in range(num_workers):
Worker(tid, self.queue)
def add_task(self, sec):
self.queue.put(sec)
def complete_work(self):
self.queue.join()
def work(sec):
call(["sleep", str(sec)])
def main():
seconds = [randrange(1, 5) for i in range(20)]
pool = WorkerPool(5)
for sec in seconds:
pool.add_task(sec)
pool.complete_work()
if __name__ == '__main__':
main()
So I run this script on the server:
johanhar@mamadev:~$ python pythonprocesstest.py
And then I check my processes on the server:
johanhar@mamadev:~$ ps -fux
The result from ps
looks wrong to me. To me it looks as if I have something happening under python but in one process, so it will only go slower the more converts (or sleep as in this test case) I have even if we have several cores on the server...
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
johanhar 24246 0.0 0.0 81688 1608 ? S 13:44 0:00 sshd: johanhar@pts/28
johanhar 24247 0.0 0.0 108336 1832 pts/28 Ss 13:44 0:00 \_ -bash
johanhar 49753 0.6 0.0 530620 7512 pts/28 Sl+ 15:14 0:00 \_ python pythonprocesstest.py
johanhar 49822 0.0 0.0 530620 6252 pts/28 S+ 15:14 0:00 \_ python pythonprocesstest.py
johanhar 49824 0.0 0.0 100904 564 pts/28 S+ 15:14 0:00 | \_ sleep 4
johanhar 49823 0.0 0.0 530620 6256 pts/28 S+ 15:14 0:00 \_ python pythonprocesstest.py
johanhar 49826 0.0 0.0 100904 564 pts/28 S+ 15:14 0:00 | \_ sleep 3
johanhar 49837 0.0 0.0 530620 6264 pts/28 S+ 15:14 0:00 \_ python pythonprocesstest.py
johanhar 49838 0.0 0.0 100904 564 pts/28 S+ 15:14 0:00 | \_ sleep 3
johanhar 49846 0.0 0.0 530620 6264 pts/28 S+ 15:14 0:00 \_ python pythonprocesstest.py
johanhar 49847 0.0 0.0 100904 564 pts/28 S+ 15:14 0:00 \_ sleep 3
So if you still don't get the problem or what I'm asking for. Is this approach what you could call "multi core programming"?
I think you are misreading the ps
output. I count 4 distinct Python instances, each which could, in principle, be allocated to its own core. Whether they actually do get their own core is one of the harder bits of multi-processing.
Yes, there is the superior Python process (PID 49753) which is parent to the sub-processes, but there is also a bash
which is parent to that in an analogous way.