Is treating two uint8_ts as a uint16_t less efficient

Suppose I created a class that took a template parameter equal to the number uint8_ts I want to string together into a Big int.

This way I can create a huge int like this:

SizedInt<1000> unspeakablyLargeNumber;  //A 1000 byte number

Now the question arises: am I killing my speed by using uint8_ts instead of using a larger built in type.

For example:

SizedInt<2> num1;
uint16_t num2;

Are num1 and num2 the same speed, or is num2 faster?

Solution

It would undoubtedly be slower to use uint8_t[2] instead of uint16_t.

Take addition, for example. In order to get the uint8_t[2] speed up to the speed of uint16_t, the compiler would have to figure out how to translate your add-with-carry logic and fuse those multiple instructions into a single, wider addition. I'm sure that some compilers out there are capable of such optimizations sometimes, but there are many circumstances which could make the optimization unlikely or impossible.

On some architectures, this will even apply to loading / storing, since uint8_t[2] usually has different alignment requirements than uint16_t.

Typical bignum libraries, like GMP, work on the largest words that are convenient for the architecture. On x64, this means using an array of uint64_t instead of an array of something smaller like uint8_t. Adding two 64-bit numbers is quite fast on modern microprocessors, in fact, it is usually the same speed as adding two 8-bit numbers, to say nothing of the data dependencies that are introduced by propagating carry bits through arrays of small numbers. These data dependencies mean that you will often only be add one element of your array per clock cycle, so you want those elements to be as large as possible. (At a hardware level, there are special tricks which allow carry bits to quickly move across the entire 64-bit operation, but these tricks are unavailable in software.)

If you desire, you can always use template specialization to choose the right sized primitives to make the most space-efficient bignums you want. Otherwise, using an array of uint64_t is much more typical.

If you have the choice, it is usually best to simply use GMP. Portions of GMP are written in assembly to make bignum operations much faster than they would be otherwise.