Search code examples
pythonperformancematlabmarkov-chainsbranch-prediction

"Branch Predictions" and Markov Chain in High-Level Languages


Some functions use a branch prediction system that allows faster computation for certain data structure, e.g. using sorted data in a loop containing an if statement function is faster than processing, while the data is unsorted (see the following marvelous post for further information on branch prediction: Why is it faster to process a sorted array than an unsorted array?).

My question is fairly simple: are high-level programming languages such as C, Python, MATLAB, R, etc. using Markov Chain predictions to increase the speed of certain calculations?

I am no expert and I am just starting to learn about Markov Chains, but it seems to me that every code, mostly the one containing loops, have some sort of temporally predictable random events that could be used to speed up calculations.

An example would be a function applied to a matrix that contains integers between 1 and 200, where 90% of the values are lower than 100, i.e. 2 digits. It seems to be that automatically "pre-allocating" a certain number of bits based on the probability of having x or y digits integer could increase the performance.

Is that how branching prediction works? Does that extend to any calculations?


Solution

  • About BigInteger

    Theoretically, that is correct if calculating an integers multiplication was actually an increasing function of bit-size. That is, if integer multiplication was implemented as loop with an if-statement. However, computations on 32 bit integers will be the 'same' cost as 16 bit integers or 64 bit integers on current gen 64 bit CPUs since they will use the same adder gates implemented in the hardware. A more involved discussion is found at Performance 32 bit vs. 64 bit arithmetic

    Instead, you will incur extra overhead costs trying to actually store those partially sized values (e.g. bit-fields, bit-vectors) if you are manually doing so.

    Let's say for now we are talking about very large integers, think java.math.BigInteger. Then you are correct in saying that you may gain a performance benefit by pre-allocating a said number of bits instead of starting with 1 unit ('digit') and creating new 'digits' as you perform operations. Just take a look at the OpenJDK Implementation of BigInteger.multiply(). That is exactly what they do in their multiply method. This idea is also, in some ways, related to the Pre-allocation performance tip for MATLAB. Do not unnecessarily branch on OutOfBounds when you can just as easily pre-allocate an array.

    In Practice

    Now to directly answer your question of whether languages take advantage of Markov chains, I believe not. No language source I've ever seen (or engine, for that matter) maintains a true Markov chain. Most languages at a high level are meant as general-purpose in their context, and so predictions of state would add considerable overhead to any sort of calculations. Coming back to the java.math.BigInteger example, it is almost always faster to just allocate the sum of the digits than to use a predictive mechanism that stores state. It might save memory in the result, but it is detrimental to performance. The CPU itself relies on state machines (can be thought of as Markov chains) to perform predictive optimizations, but that is because it can afford to without a noticeable cost detriment (see the wiki on branch prediction or an example in java hardware array prefetching).

    That being said, the idea of temporally predictable events is the crux of improving performance in general and is to be taken advantage of; it is not limited to branch predictions. Organization of data is critical in performance. Examples include Generational Garbage Collection in .NET, Efficient Management of Particles in Particle FX and Web Server Predictive Prefetching. The last one actually utilizes a markov predictor in order to determine what page to prefetch into the cache as a function of user logs. There are many other examples that can be found in Game Engine and Runtime implementations of smart organizations of data in order to beat branch prediction or just altogether skip it!

    tldr;

    In practical development most optimizations rely on making simplifying assumptions about your data vs. applying a markov predictor. So if you wish to take advantage of branch prediction, know your data and organize it well. That will either improve your prediction, or allow you to skip it altogether. Using a predictor likely will cost you more in development and execution time. If you still feel a predictor might help, maintain a simple state machine and benchmark it against the naive solution.