Is there a runtime cost of assigning intermediate values to variables compared to just including their literals in an expression?
For example is temporary_assignments
slower or otherwise inferior to inlined
?
// Use variable assignment to assign names to the intermediate components of the
// final answer, then combine those names in the final answer.
fn temporary_assignments(a: f32, b: f32, c: f32) -> f32 {
let fourac = 4. * a * c;
let discrim = b * b - fourac;
let rad = discrim.sqrt();
let denom = 2. * a;
(-b + rad) / denom
}
// Express the entire final answer with literals.
fn inlined(a: f32, b: f32, c: f32) -> f32 {
(-b + (b * b - 4. * a * c).sqrt()) / (2. * a)
}
In what situations does "inlining" expressions in this manner offer a runtime cost improvement compared to using variables to store intermediate values?
No, intermediate variables do not have a runtime cost in Rust.
While different ways of writing a (semantically equivalent) function might lead to slightly different CPU instructions and performance, there is generally no correlation (neither positive nor negative) between the amount of intermediate variables and the quality of the assembly generated by a modern optimizing compiler like rustc. In many cases, the output will not differ at all.
If you look at the optimized assembly output for both your examples, you will find that it is nearly identical, and should perform about the same too.
Just like C/C++, Rust is a statically compiled language, with an optimizing compiler backend (LLVM in case of rustc).
Your CPU does not have a concept of 'variables', it has registers and memory. So the compiler analyzes the source code that you wrote, and tries to figure out the most efficient way to express that semantic using your CPU's instruction set.
The Rust compiler with it's LLVM backend does this by
For example, here is the (non optimized) LLVM-IR for your inlined
function:
define float @inlined(float %0, float %1, float %2) unnamed_addr {
%4 = fneg float %1
%5 = fmul float %1, %1
%6 = fmul float 4.000000e+00, %0
%7 = fmul float %6, %2
%8 = fsub float %5, %7
%9 = call float @"<sqrt_function>"(float %8)
%10 = fadd float %4, %9
%11 = fmul float 2.000000e+00, %0
%12 = fdiv float %10, %11
ret float %12
}
As you can see, this representation uses virtual registers (e.g. %7
) for the result of literally every single operation. These virtual registers will later be turned into the actual physical registers of your cpu architecture during register allocation.
So your source code variables are already eliminated before serious opimization work has even begun.
Now, this IR will run through many optimization passes. Some interesting ones are:
mem2reg
: Instead of storing data in memory and retrieving it using load
and store
instructions, just keep the data in a register and operate on it directly (can be significantly more efficient, but is not always possible, we might have passed around references that rely on the data having a memory address).
If you were copying around any struct
s in your example, this optimization pass (in combination with others) would help to get rid of many uneccessary copies.cse
: Common subexpression elimination: Reuse results we already computed somewhere else, instead of recomputing them.inlining
: Instead of calling a function, copy paste the IR of that function into our function. This gives the optimizer much more context about what is going on (which majorly helps other optimization passes), but can of course increase the binary size as we duplicated code, so we can't always do this. A good inlining strategy is often considered the most critical part of an optimizing compiler. This is also why virtual function calls (or dyn trait calls in Rust) are often considered expensive, because the compiler usually can't inline those.While lots of work goes into making backends like LLVM emit good assembly, these tools aren't magic. If you do something that the compiler doesn't 'understand' (== has an optimization pass for), it won't help you. So if performance is critical, use tools like the fantastic godbolt.org to figure out what is going on in your concrete case.
Most importantly though, please don't do weird premature 'optimizations' like omitting intermediate variables that would improve readability for the sake of 'performance' without understanding what's actually going on :).
clone
a complex structure like a Vec
or a large array, where actual work has to be done, this is of course different. (In some cases compilers can even optimize that away though.)Debug
mode, so make sure to benchmark Release
builds.