I am studying 3d rendering using OpenGL and C, and writing a small mathematical library for the purpose of studying. Is it better to return the result of the matrix multiplication function using a return
statement, or by modifying an output matrix via pointer?
typedef float vec_t;
typedef struct mat4_s {
vec_t m[4][4];
} mat4_t;
void Mat4Mult(mat4_t* out, const mat4_t* in1, const mat4_t* in2) {
out->m[0][0] = /* ... */;
out->m[1][0] = /* ... */;
/* ... */
}
mat4_t Mat4Mult(const mat4_t* in1, const mat4_t* in2) {
mat4_t result;
result.m[0][0] = /* ... */;
result.m[1][0] = /* ... */;
/* ... */
return result;
}
I want to understand which option would be more correct. I think both options are correct, but I prefer to return the result of a function using a return
statement. Please correct me if I'm wrong, I haven't fully mastered C.
It is very difficult to answer these questions by intuition, even if you have a mountain of experience. This is why you should try both, and profile the results. Let's compare the following naive 4x4 matrix multiplication functions:
void Mat4Mult_Dest(mat4_t* out, const mat4_t* in1, const mat4_t* in2) {
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
out->m[i][j] = 0;
for (int k = 0; k < 4; ++k) {
out->m[i][j] += in1->m[i][k] * in2->m[k][j];
}
}
}
}
mat4_t Mat4Mult_Ret(const mat4_t* in1, const mat4_t* in2) {
mat4_t out = {0};
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
out.m[i][j] = 0;
for (int k = 0; k < 4; ++k) {
out.m[i][j] += in1->m[i][k] * in2->m[k][j];
}
}
}
return out;
}
The results vary significantly between GCC and clang. Looking at the assembly, this is probably because clang inlined the _Ret
version, but didn't do the same for the _Dest
version. GCC inlined both functions, making them perform essentially the same. This is unsurprising, because the two functions are performing the same calculations.
According to the benchmarks, returning by value is at least as fast as writing to a destination matrix. It is more inlining-friendly for some compilers, which may improve performance. However, you could likely achieve the same results by annotating your functions so that they are more likely to be inlined.
It is worth noting that in Mat4Mul_Ret
, return out;
is writing to a destination in-place anyways, because large objects are passed via destination pointer in the x86_64 ABI:
Mat4Mult_Ret:
// ...
// last 4 instructions move result to destination pointer
movups xmmword ptr [rdi + 48], xmm1
movups xmmword ptr [rdi + 32], xmm7
movups xmmword ptr [rdi + 16], xmm4
movups xmmword ptr [rdi], xmm3
ret
There is one notable difference between your functions though: mat4_t* out
can be aliased by in1
and in2
, but a local mat4_t out
can not. Consider marking your pointers restrict
to give the compiler more optimization freedom.