Search code examples
c++alignmentcpu-cache

Is std::hardware_constructive_interference_size ever useful?


Existing questions in this area that still don't ask specifically my question:

The answer to second one actually makes me ask this question.

So, assuming I want to have constructive interference. And I'm putting few variables into a single struct that fits std::hardware_constructive_interference_size:

struct together
{
  int a;
  int b;
};

Advantage seems too weak to ban compilation with static_assert if it does not fit:

// Not going to do the below:
static_assert(sizeof(together) <= std::hardware_constructive_interference_size);

Still aligning is helpful to avoid structure span:

struct alignas(std::hardware_constructive_interference_size) together
{
  int a;
  int b;
};

However the same effect can be achieved with aligning just on structure size:

struct alignas(std::bit_ceil(2*sizeof(int))) together
{
  int a;
  int b;
};

If structure size is larger than std::hardware_constructive_interference_size, it may still be helpful to align it on structure size, because:

  • It is compile-time hint that may become obsolete with later CPUs the compiled program run on
  • It is one of cache levels cache line size, if there are more than one, exceeding one cache level cache line may still give useful sharing of other level cache line
  • Aligning on structure size is not going to make much more than twice overhead. Aligning on cache line size may potentially cause more overhead, if cache line size becomes way more than structure size.

So, is there any point left for std::hardware_constructive_interference_size?


Solution

  • Consider a std::deque<T>. It's often implemented using chunks of a given size. But how many T's do you store per chunk? A reasonable answer is std::hardware_constructive_interference_size/sizeof(T), if sizeof(T) is small.

    Similarly, a string class with the Small String Optimization may aim for a size of std::hardware_constructive_interference_size. In general, the size is useful when you can have a run-time variable amount of data with high locality of reference.