Search code examples
c++performancememory-managementmove-semanticsdata-oriented-design

C++ choice of pass by value vs pass by reference for POD math structure classes for high performance applications considering cache coherency


For many high performance applications, such as game engines or financial software, considerations of cache coherency, memory layout, and cache misses are crucial for maintaining smooth performance. As the C++ standard has evolved, especially with the introduction of Move Semantics and C++14, it has become less clear when to draw the line of pass by value vs. pass by reference for mathematical POD based classes.

Consider the common POD Vector3 class:

class Vector3
{
public:
   float32 x;
   float32 y;
   float32 z;
   // Implementation Functions below (all non-virtual)...
}

This is the most commonly used math structure in game development. It is a non-virtual, 12 byte size class, even in 64 bit since we are explicitly using IEEE float32, which uses 4 bytes per float. My question is as follows - What is the general best practice guideline to use when deciding to pass POD mathematical classes by value or by reference for high performance applications?

Some things for consideration when answering this question:

  • It is safe to assume the default constructor does not initialize any values
  • It is safe to assume no arrays beyond 1D are used for any POD math structures
  • Clearly most people pass 4-8 byte POD constants by value, so there doesn't seem to be much debate there
  • What happens when this Vector is a class member variable vs a local variable on the stack? If pass by reference is used, then it would use the memory address of the variable on the class vs a memory address of something local on the stack. Does this use-case matter? Could this difference where PBR is used result in more cache misses?
  • What about the case where SIMD is used or not used?
  • What about move semantic compiler optimizations? I have noticed that when switching to C++14, the compiler will often use move semantics when chain function calls are made passing the same vector by value, especially when it is const. I observed this by perusing the assembly breakdown
  • When using pass by value and pass by reference with these math structures, does const make a much impact on compiler optimizations? See the above point

Given the above, what is a good guideline for when to use pass by value vs pass by reference with modern C++ compilers (C++14 and above) to minimize cache misses and promote cache coherency? At what point might someone say this POD math structure is too large for pass by value, such as a 4v4 affine transform matrix, which is 64 bytes in size assuming use of float32. Does the Vector, or rather any small POD math structure, declared on the stack vs. being referenced as a member variable matter when making this decision?

I am hoping someone can provide some analysis and insight to where a good modern guideline for best practices can be established for the above situation. I believe the line has become more blurry as for when to use PBV vs PBR for POD classes as the C++ standard has evolved, especially in regard to minimizing cache misses.


Solution

  • I see the question title is on the choice of pass-by-value vs. pass-by-reference, though it sounds like what you are after more broadly is the best practice to efficiently passing around 3D vectors and other common PODs. Passing data is fundamental and intertwined with programming paradigm, so there isn't a consensus on the best way to do it. Besides performance, there are considerations to weigh like code readability, flexibility, and portability to decide which approach to favor in a given application.

    That said, in recent years, "data-oriented design" has become a popular alternative to object-oriented programming, especially in video game development. The essential idea is to think about the program in terms of data it needs to process, and how all that data can be organized in memory for good cache locality and computation performance. There was a great talk about it at CppCon 2014: "Data-Oriented Design and C++" by Mike Acton.

    With your Vector3 example for instance, it is often the case that a program has not just one but many 3D vectors that are all processed the same way, say, all undergo the same geometric transformation. Data-oriented design suggests it is then a good idea to lay the vectors out in contiguously in memory and that they are all transformed together in a batch operation. This improves caching and creates opportunities to leverage SIMD instructions. You could implement this example with the Eigen C++ linear algebra library. The vectors can be represented using a Eigen::Matrix<float, 3, Eigen::Dynamic> of shape 3xN to store N vectors, then manipulated using Eigen's SIMD-accelerated operations.