Take a look at below scenario.
;some code
test reg1,reg2
je jump1
;do something
add rsp,20
pop rdx
ret
jump1:
;do something
cmp reg2,reg3
jg jump2
add rsp,20
pop rdx
ret
jump2:
;do something
add rsp,20
pop rdx
ret
Similar assembles are not commonly found in disassembled codes. Perhaps compilers handle such much efficiently.
Can having multiple return statements affect performance?
What are the possible performance outcomes using a single return with jmp
compared to the above?
This is called "tail duplication" optimization. Some compilers do do this sometimes. e.g. LLVM blog post about it
It's generally a good thing when your function epilogues are small (only 1 pop) so it doesn't cost much, especially on modern x86 with it's large caches and good code density (ret
and pop
are single-byte). Although if only one path through the function is expected to be "hot", maybe better to have the other one jmp to the hot one to save a small amount of uop-cache space.
It saves one taken jmp
on that path out of the function. The performance impact of that depends on the surrounding code, as always for a deeply pipelined superscalar out-of-order CPU!
If multiple paths through a function could be hot depending on how your function is used, they can both/all be fully efficient.
You can also do it for loops that have a branch inside the loop: duplicate the dec/jcc
or whatever at the bottom of the loop instead of jumping to a common dec/jcc
. (Don't forget to handle the fall-through path in both / all cases!)