For the purpose of backtesting a trading strategy, based on a machine learning model, I wanted to compute the retraining procedure of the model in parallel.
Now my question is:
It is implemented in python using scikit-learn. The procedure of backtesting is defined as follows:
Simply, make a prediction for the next data point and retrain the model on this data point. Then do this till the current day ( last data point ). For some stock predictions, this could be up to 5000 data points, which means by starting this backtesting procedure with a model trained on the first 900 data points, I need to retrain and predict the model 4100 times.
To parallelize the procedure I am using multiprocessing. I have the chance to use a server, that provides 40 CPU kernels. So what I am doing is:
Python Code:
pool = Pool()
results = []
for i in range(len(parts)):
try:
result = pool.apply_async(classify_parts,[parts[i], parts[i+1]])
results.append(result)
except IndexError:
continue
for result in results:
result.get()
The method classify_parts, starts the procedure for the given range. For instance, if I have 500 data points and will start the whole backtesting by training the model on 100 data points, I have 400 data points ( days ) left for the backtesting, which means:
classify_parts( 10, 20 )
, ... , classify_parts( 390, 400 )
Hopefully I could illustrated my concept in a clear manner.
So my biggest question is, if there is another more efficient way of doing backtesting with a machine learning model, that is retrained on every next data point ( day )? Because with this concept, one backtesting procedure for 5000 data points runs more than 10 minutes.
Maybe incremental / online learning is here the way to go?
At first, this is not parallelizing the process scheduling.
If in doubts, one may revisit theory details on SERIAL, PARALLEL and "just"-CONCURRENT process scheduling before reading further.
Things may go into SERIAL mode of operations by-principle ( by intent ) or by some sort of resource-limitation. If one is to type a word "parallel
" on a keyboard, the letters p
, next a
, next r
... etc have to be touched SERIAL-ly, one after another, otherwise the input process fails to provide the guaranteed, correct result - i.e. the word "parallel
" on screen ( and let's assume we forget about a chance to use a Shift
key for the clarity of this example ).
Anyone may test, what gets produced, if one tries to press all the keys at the same moment, or in pairs, or while a cat is standing at your keyboard ... all additional ideas are welcome for hands-on experimentation.
In any other circumstances than by getting faster processing-resources and/or by lowering inter-process transport-delays ( or in case a new, smarter algorithm exists and is known to the kind programmer and yet additionally shows to be feasible to get implemented on resources available ).
On the other hand, some things must be organised so as to happen right in the opposite end of the spectra -- if all things simply have to start + happen + finish at the same time, otherwise the results from the process are considered wrong. What could be brought as an example?
imagine a few visually simply "readable" examples - the show of Red Arrows - a group of 9 jet planes, which displays figures and manoeuvres of high pilotage
or Frecce Tricolori
where all planes move at the same time exactly the given ( same ) way, otherwise the lovely show would fail.
imagine a concert, that has to take place. This is exactly the ultimately PARALLEL
process and one would immediately recognise how devastating that would be, if it were not
imagine a robot-control system for several degrees-of-freedom, where all motion has to be carried out and has to be kept under a control of a closed-loop [ actor - detector - actuator ] subsystem in sync, coordinated for all of the several programming-controlled axes. The system must control all the NC-axes together as a PARALLEL process -- otherwise the robot-arm will never be able to follow the black-line - the defined, curved, spatio-temporal trajectory in +4-D space ( best to view in real-time ).
The robot-control example shows, that the farther we move from the aerobatics or the symphony examples, for which we have some direct experience ( or where one's imagination can provide some comprehension ), the PARALLEL process-scheduling may seem harder to sense or to realise, but the symphony seems to be the best one to remind, when questioning if a problem is indeed a truly PARALLEL or not.
Just ask yourself, what would happen, if just half of the ensemble comes in due time to start playing at 19:00 sharp, or what would happen, if some of the violins plays at an accelerated speed... or what would happen, if the players will start playing just one note, in a sequence and let his/her left side-neighbour to play his/her note, waiting until the order of playing gets back to the same player to perform his/her next note in the concert. Correct, that would be pure-SERIAL scheduling. )
The problem still depends on the processing.
If the learning-method ( in an undisclosed MCVE ) has a built in dependence, for example if classify_parts( 20, 30 )
needs to get some input or a piece of information, generated from a "previous" step in classify_parts( 10, 20 )
, then the answer is no.
If the learning-method is independent, in the above defined sense, then there is a chance to schedule the process as a "just"-CONCURRENT and benefit from the fact that there are more "free"-lanes on the highway to allow cars run in not just one lane, but allow one to overtake another, slower one, on the way from start to finish. A rugby team can move from city A to city B "faster", if there is 2 or 3 or more 4-seat cars available in the city A stadium, than if there is just 1 and it has to go there and back and forth to move all the players successfully to city B.
Next, even if there are cars enough to move to city B for all players, the duration would be uncomfortably high in case, there is just one lane road from A to B and there is a high traffic in the opposite direction from B to A ( not allowing to overtake ) and a single tractor, also going from A to B, but at a speed of 1 mph.
Still, even if there are none of the above sketched "external" obstacles, one has to be very careful, not to lose the benefits available from "bigger" machine + "wider" highways ( more resources available ) for a "just"-CONCURRENT process scheduling.
Yes, the problem here is related with programming tools.
Python can help a lot for fast prototyping, but beware for the Global Interpreter Lock ( GIL ) code-execution implications.
In your case, using a Pool()
of 40-threads might look attractive, but will still wait for a GIL-acquisition/release and ALL THE WORK WILL turn to be SERIAL
again, just due to omission to notice the known GIL-mechanics on a Pool()
of python-threads.
Yes, there are other more promising alternatives, like a ThreadPool()
et al, but not only the code-related GIL-stepping is devastating your efforts to accelerate the speed of the process, but also the shared data-structures is a problem to watch and to avoid. Additionally, the Pool()
naively-replicated environments may grow beyond one's system RAM capacities and the swapping mechanism will yield such attempt to accelerate the process totally un-usable at all. Share nothing, otherwise your dream to accelerate processing will lose on sharing-mechanics and you are back on the square No.1.
The best way is to design the code as independent processes, that have zero-sharing, non-blocking ( MEM, IO ) operations.