cperformanceassemblybranchless

How can I make branchless bounds checking?


I'm trying to create a type of safe buffer that automatically handles overflow without any branching. The buffer size is a power of two and shall only have valid positive (i.e. not including zero) indices. It also allows checked removal, which is removal at a given index if the element stored at that index is equal to a search key.

I was essentially going for something like this

Element *buffer[256];

inline void buffer_insert(size_t index, Element *elem){
  buffer[index < 256 && index] = elem;
}

//Optional: checked insert to prevent overwrite. Will only insert
//if the buffer holds NULL at index.
inline void buffer_checkedInsert(size_t index, Element * elem){
  buffer[index && !buffer[index < 256 && index]] = elem;  
}

inline void buffer_checkedRemove(size_t index, Element *elem){
  buffer[0] = NULL; //Maybe useful if buffer[0] stores elem
  buffer[((elem == buffer[index < 256 && index)) && index] = NULL;
}

So I basically want to access index 0 whenever the index passed in is out of bounds, as buffer[0] is not a valid buffer index. And I also want to access index 0 whenever the element to be removed is not equal to the element that is passed into the removal, and I might want to also access index 0 if the buffer contains something at index.

My questions are:

  • Is what I have really branchless? Because if the C compiler decides to use short-circuiting on &&, the code might get branched.
  • If && causes branching, is there an alternative that has the same behavior in this case that does not involve branching?
  • Can this be faster than a basic overflow check? Or could the C compiler somehow give a branchless version of if(index < 256) buffer[index] = elem?

Solution

  • Is what I have really branchless? Because if the C compiler decides to use short-circuiting on &&, the code might get branched.

    Maybe. The compiler might be clever enough to emit branchless machine code in these cases, but you cannot rely on it.

    If && causes branching, is there an alternative that has the same behavior in this case that does not involve branching?

    Your question is a bit confused. The fact that a compiler may emit branching code to implement the && operation follows from the defined behavior of that operation. Any alternative that had the same behavior must afford the same possibility of branching.

    On the other hand, if you mean to ask whether there is an alternative that computes the same result in all cases, then yes, you can rewrite those expressions to do so without the possibility of branching. For instance, you could use either the & or the * operator like so:

    buffer[(index < 256) & (index != 0)] = elem;
    

    Or, you could implement the behavior you actually want:

    buffer[(index < 256) * index] = elem;
    

    There's no reason to think that the compiler would emit a branch instruction for either of those computations; if it did, that would probably be because it thinks that would provide a performance improvement on the target architecture.

    Can this be faster than a basic overflow check? Or could the C compiler somehow give a branchless version of if(index < 256) buffer[index] = elem?

    The branchless versions certainly can be faster. They are most likely to be observably faster on workloads where the (non-)branch is executed a lot, and there is no easily-discernible pattern to which alternative is taken. But if the (non-)branching mostly follows a regular pattern, and especially if it almost always goes one way, then the CPU's branch prediction unit could make an ordinary validity check at least as fast as the branchless assignments.

    Ultimately, there's no good reason to worry about this without benchmarking the actual performance of your code on real data, or a good facsimile thereof. The result is likely to be data dependent, and whether it matters at all depends on how much of the program's run time is spent in the functions you ask about. Until and unless you have a good benchmark demanding otherwise, you should code for clarity and maintainability.