Search code examples
cbit-manipulationbitwise-operatorsbit-shiftsize-t

Is it proper to use size_t for count in a bitshift operation?


This small, seemingly insignificant question, occurred to me recently, but after googling quite a bit, I have been unable to find even an opinion on the subject. Only loops and object sizes are referred to.

I know people like examples, so here the one that cause the question in the first place:

uint64_t deltaSwap( const uint64_t b, const size_t delta, uint64_t mask )
{
    return b ^ mask ^ ( mask &=  b ^ b >> delta ) << delta;
}

I've been trying too optimize this for a while, knowing full well that this is not the way to write proper code, though it has given me the best result so far, at least with GCC, and then it occurred to me. If you really want to be pedantic, shouldn't delta be of type size_t?

I never really understood when to use size_t, so I never really do, but if I were to, wouldn't this be correct usage?

Update: Here's the short explanation of what it does, although not how it does so, as I'm not really sure how to explain it:

It is a standard delta swap, which is not a new ideal, and the code works fine, this isn't really about the code (but since you asked), and all I've really done is experimenting with it, to achieve the best performance, and the version you see here, is my best result so far.

The purpose of the code is to swap two or more bits, fx if you wish to swap the first and the last bit, it could be done thusly:

deltaSwap(b, 63, 0x0000000000000001);

or if you wish to reverse the order of the bits:

deltaSwap(b, 32, 0x00000000ffffffff);
deltaSwap(b, 16, 0x0000ffff0000ffff);
deltaSwap(b,  8, 0x00ff00ff00ff00ff);
deltaSwap(b,  4, 0x0f0f0f0f0f0f0f0f);
deltaSwap(b,  2, 0x3333333333333333);
deltaSwap(b,  1, 0x5555555555555555);

though for this particular task, deltaswaps are probably not the best way to go.

Update 2: Just for completion, this is the most correct onliner I can think of (haven gotten my answer), and the compiler apparently optimizes it perfectly.

uint64_t deltaSwap( const uint64_t b, const uint_fast8_t delta, const uint64_t mask )
{
    return b ^ ( mask & ( b ^ b >> delta ) ) ^ ( mask & ( b ^ b >> delta ) ) << delta;
}

I would've shortened the variable name to get it all to fit within the 80 character with imposed by my ocd brain (and apparently this site as well), but for you all, I'm willing to suffer.


Solution

  • If you really want to be pedantic, shouldn't delta be of type size_t?

    No, if you really want to be pedantic, delta should just be an unsigned integer type capable of holding at least the range of values from 0 to sizeof(uint64_t) * CHAR_BIT. In your case that's [0, 63]. There's no real need to have it be size_t.

    I never really understood when to use size_t, so I never really do, but if I were to, wouldn't this be correct usage?

    In terms of correctness of code, it's ok. In terms of optimization it doesn't make much sense. size_t is used to hold a size, because it's a type guaranteed to be capable to hold the biggest possible size of an object. It's certainly not guaranteed to be any faster than a normal unsigned or any other unsigned integer type (see bottom of the answer for this).

    Another important thing to notice is that this:

     b ^ mask ^ ( mask &=  b ^ b >> delta ) << delta
    

    Is undefined behavior according to the C standard, since you are using the value of a variable while also applying a side effect to it in the same statement (see paragraph 6.5 point 2, page 76 here).

    The correct way to do what you want is:

    mask &= b ^ (b >> delta);
    return b ^ (mask ^ (mask << delta));
    

    In any case always take extra care when working with bitshift operators since they have precedence over other bit operators. Using extra parentheses or splitting expressions into multiple lines doesn't hurt performance and improves readability. A decent compiler will optimize the above without any problem.


    Now, back to the real reason you're asking this question: optimization.

    The correct way to optimize your code is to let the compiler choose the optimal size of the variable that you need. For this purpose, you only need one byte, and you can use uint_fast8_t from stdint.h, which is the fastest (implementation defined) unsigned integer type with width of at least 8 bits. The compiler will choose the fastest width for its purpose.

    With the above being said, the correct way to optimize your code is:

    uint64_t deltaSwap( const uint64_t b, const uint_fast8_t delta, uint64_t mask )
    {
        mask &= b ^ (b >> delta);
        return b ^ (mask ^ (mask << delta));
    }
    

    Depending on what you're doing it might as well make sense to declare the function as inline __attribute__ ((always_inline)) if GCC does not already inline the code for you, though the compiler is usually better at figuring out when to inline code and when not. Your function is most probably already going to be inlined.

    One more important thing: using the right optimization flags often matters more than hand-tuning code. For example, for the above code, you might want to compile with -Ofast -march=native, and maybe even other flags depending on where you use the function (e.g. -ftree-vectorize if used in a loop).

    Other than the above: benchmarking, switching to hand-tuned assembly with an asm() statement and looking at the surrounding code are the only ways to optimize the above further, assuming that the formula is already simplified to its core.