Does function such as this have negative effect on performance?
fn:
cmp rdi, 0
je lbl0
...
ret
lbl0:
...
ret
call fn
And this one?
fn0:
... ; no ret, fall through
fn1:
...
ret
Fall through is the most efficient thing you can do; it's just normal execution. The CPU can't even know the difference between "2 different functions" vs. labels within a function; it's all just machine code. Labels are zero-width, and just give you a way to refer to that address from elsewhere.
From a high level you could look at it as an optimized tailcall of the 2nd function like you'd do with jmp fn1
instead of call fn1; ret
, and then of course optimizing away jmp +0
because jumping to the next instruction is architecturally a nop
.
As for the first one, that's called "tail duplication" optimization, where multiple paths out of a function duplicate any necessary cleanup (pop rbx
or whatever) and a ret
, instead of running an extra jmp
to reach a single copy of the cleanup.
Tail duplication costs code footprint (static code size) but results in fewer dynamic instructions executed per call. It doesn't generally hurt branch prediction; ret
is predicted by a stack-like predictor that matches ret
with call
(i.e. it assumes that ret
will return to the last call
that executed.) As long as this is still true (which it is here), you don't have a problem. You have multiple ways out of the function, but exactly one of them runs for each call to it.
You can also do loop tail duplication where you branch inside the loop and each side of the branch separately has a dec ecx / jnz .top_of_loop
(with any necessary jmp
or whatever outside the loop).