Multithreading/processing for loop not working or slower than original def

First off, this is my first attempt to multithreading or multiprocessing (outside of tutorials). I am trying to speed up some initializations in my class by using multithreading or multiprocessing, not sure which one makes more sense yet. My code goes something like this

import threading

classlist = []

filelist = ['a','b','c'] #it's a list of string paths

def loadClasses(filelist):
    global classlist, filelist
    classlist = [OtherClass(i) for i in filelist]

def threadingfunc(filelist):
    t1.threading.Thread(target=loadClasses)

threadingfunc()

This seems to take twice as long compared to when I just run the loadClass function. The OtherClass takes about 1.5 seconds to run but when I have about 40 files to load, it adds up.

I attempted to try the same thing with multiprocessing and it didn't seem to work at all. As far as multiprocessing goes, I have not had any luck making anything work. This is close to what I've used.

from multiprocessing import Pool

classlist = []
def loadClass(file):
    classlist.append(OtherClass(file))

def pool_handler():
    p = Pool(2)
    for file in filelist:
        p.map(loadClass, file)

This took about the same amount of time. So I'm not sure where to go from here. Long story short, I have a list of files I want to load into my OtherClass and I'm looking for ways to speed it up. I appreciate any help and please be nice to a nooby!

I've tried the above code chunks with multithreading and multiprocessing. I was able to get them to work, but I saw no improvement in speed of completion. Some methods were in fact longer.

Solution

python threads

Using threads in C++ can let you burn all cores, and finish faster.

Using threads in python is typically only of interest for I/O bound processes like a web server. Each thread must acquire the GIL before accomplishing any work, so applying python threads to a compute bound job typically won't be winning.

lint

def loadClass(file):

Pep-8 asked you nicely: spell it load_class, please.

This is especially salient because the function under consideration is manifestly not a class. No need to provoke LoadClass double-take confusion.

cores

    p = Pool(2)

It is possible that you're on a Pentium with two cores. But more likely you have greater than five cores available. You could specify a larger number, but you might prefer to let it default to number of cores detected at runtime.

overhead: serializing

    for file in filelist:

You mentioned that each "load" operation takes about 1500 msec. That seems a good impedance match for this line. I will just mention that, if each operation ran in a mere fraction of a second, then you might consider batching up several file entries in a tuple, and send them across the pipe connection to each child process in a batch.

overhead: deserializing

        p.map(load_class, file)

I have no idea how big an object your load class function is appending to a central bottleneck global list in the parent. Depending on the details, that one parent process can potentially spend a lot of CPU cycles deserializing result values sent by children.

Often a good strategy will be to do a couple seconds of computation, store a large JSON result in the filesystem, and return either None, or the Path of the result file. That way the parent process doesn't need to burn cycles deserializing some giant JSON result.

result order

It appears you don't care when things happen as long as they do eventually happen, so you might be interested in the several pool.map() variants, including imap_unordered().

The detail here is that jobs may take different amounts of time. Relaxing constraints on the order in which results are delivered lets the multiprocessing library schedule jobs more aggressively, keeping more core busy most of the time, even in the presence of stragglers.

stragglers

If you know things that the multiprocessing library does not, you should expose such knowledge so it can schedule tasks more sensibly.

For example, it might be the case that "long file" implies "long time to load file". If you bring that knowledge to the table, then let the scheduler know about it. Given a Path it is easy to ask for its st_size. Use the sorted key argument to order your forty files by decreasing size. Then we load the big ones first, and idle cores will see hardly any straggler tasks at the end.