Search code examples
multithreadingalgorithmartificial-intelligencechessalpha-beta-pruning

How do I write a Multi threaded Alpha-Beta Search algorithm?


I'm trying to create a chess engine using an alpha beta minimax search algorithm, but the code is too slow. I've done all the optimisations I can think of, but it is still very slow in a single thread. I looked at the source code of some other engines to see how they do it and the chess programming wiki (https://www.chessprogramming.org/Parallel_Search#Parallel_Alpha-Beta), but the the code is beyond my level and I don't understand them. I couldn't find any written sources or code snippets either.

Can someone explain how to efficiently implement threading in an alpha-beta search algorithm? Thanks.


Solution

  • Alpha-beta is an inherently sequential algorithm as your alpha and beta values get updated continuously thorough the search and cutoffs are decided based on these values. For this reason getting any speedups by increasing the amount of threads is very hard, and the more threads you throw at it, the smaller the gains will be.

    However there's still several ways to do it, most of them fairly complicated and they scale extremely poorly with more threads. The go to algorithm used to be the Young Brothers wait concept, it is a fairly complicated algo and it was used for example by Stockfish until a few years back. However with the increasing amount of cores available on modern computers the scaling was very poor and the code very complex. Today most modern engines use something called Lazy SMP. This algorithm is almost as simple as it can be and scales better than the others.

    In lazy SMP all you have to do is start the exact same search as you would normally do, just on multiple threads. It relies on having a working transposition table through which the threads communicate with each other. The threads will never be exactly in sync and the randomness will lead each thread to explore slightly different parts of the search tree and then save their results into the transposition table, where it might be used by another thread. Of course there is a lot of repeating work done by each thread, however it is still better than trying to be clever about splitting the work and slowing down the algorithm, and this is especially true when you start scaling up the amount of threads.

    I recommend you take a look at the chess programming wiki, where you can even find some pseudo code on how to implement it. https://www.chessprogramming.org/Lazy_SMP

    Though i should also point out that if what you are looking for is improving your time to depth, implementing multithreading won't do all that much for you (and in some extreme cases it might actually even slow it down!). What you need instead is more aggresive pruning of the search tree and more efficient implementation (eg. no memory allocations, so the garbage collector never has to run, etc.).