I have a Python script containing a loop with a lot of calls to scipy.optimize.fsolve
(99 (55 + 54) times per time step, and right now I need around 10^5 time steps). The rest of the script isn't very fast either, but as far as I can tell from the output of the Spyder Profiler, the calls to fsolve are by far the most time consuming. With my current settings, running the script takes over 10 hours, so I could do with a little speed-up.
Based on what I read here and elsewhere on internet, I already gave PyPy a first try: installed it in a separate environment under conda (MacOS 10.15.5, pypy 7.3.1 and pypy3.6 7.3.1), along with its own versions of numpy, scipy, and pandas, but so far it's actually a bit slower than just Python (195 s vs 171 s, for 100 time steps).
From what I read here (PyPy Status Blog, October '17), this might have to do with using numpy instead of numpypy, and/or repeated allocation of temporary arrays. Apart from calling fsolve 10 million+ times, I'm also using quite a lot of numpy arrays, so that makes sense as far as I can see.
The thing is, I'm not a developer, and I'm completely new to PyPy, so terms like JIT traces don't mean much to me, and deciphering what's in them is likely going to be challenging for me. Moreover, what used to be true in October 2017 may no longer be relevant now. Also, even if I'd manage to speed up the numpy array bit, I'm still not sure about the fsolve part.
Can anyone indicate whether it could be worthwhile for me to invest time in PyPy? Or would Cython be more suitable in this case? Or even mpi4py?
I'd happily share my code if that helps, but it includes a module of over 800 lines of code, so just including it in this post didn't seem like a great idea to me.
Many thanks! Sita
EDIT: Thanks everyone for your quick and kind response! That's a fair point, about needing to see my code, I put it here (link valid until 19 June 2020). Arterial_1D.py is the module, CoronaryTree.py is the script that calls Arterial_1D.py. For a minimal working example, I put in one extra line, to be uncommented in that case (clearly marked in the code). Also, I set the number of time steps to 100, to have the code run in reasonable time (0.61 s for the minimal example, 37.3 s for the full coronary tree, in my case).
EDIT 2: Silly of me, in my original post I mentioned times of 197 and 171 s for running 100 steps of my code using PyPy and Python, respectively, but in that case I called Python from within the PyPy environment, so it was using the PyPy version of Numpy. From within my base environment running 100 steps takes a little over 30 s. So PyPy is A LOT slower than Python in this case, which motivates me to look into this PyPy Status Blog post anyway.
We can't really help you to optimize without looking at your code. But since you have quite the description going on up there, let me reply with what I think you can try to speed things up.
First thing's first. The Scipy library. From the source for scipy.optimize.fsolve, it wraps around MINPACK's hybrd and hybrj algorithms which are considerably fast FORTRAN subroutines. So in your case, switching to PyPy is not going to do much good, if any at all for this identified bottleneck.
What can you do to optimize your Scipy fsolve ? One of the most obvious numerical thing to do is to vectorize your function's args. But seems that you are running a sort of time step algorithm and Most standard time stepping algos are not able to vectorize in time. IF your 'XX times per time step' is a sort of implicit spatial loop per time step (i.e. your grid), you can consider vectorizing this to achieve some gains in speed. Next is to zoom into your function's guess / starting root estimate. See if you can mod your algorithm to capitalize on a good starting solution over the whole time interval (do some literature digging). Note that this has less to do with the 'programming' than your knowledge on numerical methods.
Next, on your comment on "rest of the script isn't very fast either". Well, I'd go with Cython to sweat the remaining python parts of your code, esp the loops. It is very actively developed, great community and is battle tested. I personally have used it in many HPC type problems. Cython also has a convenient html annotation that highlights potential optimizations that may be possible over your native python implementation.
Hope this helps! Cheers