First off, this is my first attempt to multithreading or multiprocessing (outside of tutorials). I am trying to speed up some initializations in my class by using multithreading or multiprocessing, not sure which one makes more sense yet. My code goes something like this
import threading
classlist = []
filelist = ['a','b','c'] #it's a list of string paths
def loadClasses(filelist):
global classlist, filelist
classlist = [OtherClass(i) for i in filelist]
def threadingfunc(filelist):
t1.threading.Thread(target=loadClasses)
threadingfunc()
This seems to take twice as long compared to when I just run the loadClass
function.
The OtherClass
takes about 1.5 seconds to run but when I have about 40 files to load, it adds up.
I attempted to try the same thing with multiprocessing and it didn't seem to work at all. As far as multiprocessing goes, I have not had any luck making anything work. This is close to what I've used.
from multiprocessing import Pool
classlist = []
def loadClass(file):
classlist.append(OtherClass(file))
def pool_handler():
p = Pool(2)
for file in filelist:
p.map(loadClass, file)
This took about the same amount of time. So I'm not sure where to go from here.
Long story short, I have a list of files I want to load into my OtherClass
and I'm looking for ways to speed it up.
I appreciate any help and please be nice to a nooby!
I've tried the above code chunks with multithreading and multiprocessing. I was able to get them to work, but I saw no improvement in speed of completion. Some methods were in fact longer.
Using threads in C++ can let you burn all cores, and finish faster.
Using threads in python is typically only of interest for I/O bound processes like a web server. Each thread must acquire the GIL before accomplishing any work, so applying python threads to a compute bound job typically won't be winning.
def loadClass(file):
Pep-8
asked you nicely: spell it load_class
, please.
This is especially salient because the function
under consideration is manifestly not a class.
No need to provoke LoadClass
double-take confusion.
p = Pool(2)
It is possible that you're on a Pentium with two cores. But more likely you have greater than five cores available. You could specify a larger number, but you might prefer to let it default to number of cores detected at runtime.
for file in filelist:
You mentioned that each "load" operation takes about 1500 msec.
That seems a good impedance match for this line.
I will just mention that, if each operation ran in
a mere fraction of a second, then you might consider
batching up several file
entries in a tuple,
and send them across the pipe connection
to each child process in a batch.
p.map(load_class, file)
I have no idea how big an object your load class function is appending to a central bottleneck global list
in the parent.
Depending on the details, that one parent process can potentially spend a
lot of CPU cycles deserializing result values sent by children.
Often a good strategy will be to do a couple seconds
of computation, store a large JSON result in the filesystem,
and return either None
, or the Path
of the result file.
That way the parent process doesn't need to burn cycles
deserializing some giant JSON result.
It appears you don't care when things happen
as long as they do eventually happen, so you might be interested
in the several pool.map()
variants, including
imap_unordered().
The detail here is that jobs may take different amounts of time. Relaxing constraints on the order in which results are delivered lets the multiprocessing library schedule jobs more aggressively, keeping more core busy most of the time, even in the presence of stragglers.
If you know things that the multiprocessing library does not, you should expose such knowledge so it can schedule tasks more sensibly.
For example, it might be the case that "long file"
implies "long time to load file".
If you bring that knowledge to the table,
then let the scheduler know about it. Given a
Path
it is easy to ask for its st_size
.
Use the sorted
key
argument to
order your forty files by decreasing size.
Then we load the big ones first,
and idle cores will see hardly any
straggler tasks at the end.