Why does LLVM not optimize away unnecessary jmps for tail calls to the next function?

The following LLVM IR:

define tailcc i64 @f() {
  %1 = musttail call tailcc i64 @g(i64 10)
  ret i64 %1
}

define tailcc i64 @g(i64 %0) align 1 optsize noinline {
  ret i64 %0
}

produces this X86 object code (clang-18 -O3 test.ll -o test.o && objdump -d test.o):

0000000000000000 <f>:
   0:   bf 0a 00 00 00          mov    $0xa,%edi
   5:   e9 00 00 00 00          jmp    a <g>

000000000000000a <g>:
   a:   48 89 f8                mov    %rdi,%rax
   d:   c2 08 00                ret    $0x8

Why is the jmp from f to g not optimized away? It is not needed, as g is immediately below f.

Solution

LLVM generally optimizes using passes, of which there are three kinds, passes that operate on functions being one.

LLVM tries to make it easy to write and combine passes. It would in fact be quite easy to write a pass to do what you're suggesting, but doing that would also do something else: Add a new requirement on all succeeding passes.

For example, with such a new pass, other passes cannot any longer analyse call and invoke instructions to find out which functions are called, because this new pass has added a new way to call functions, namely falling through.

Is the effect worth complicating the world? I'd have to say no, LLVM's simple interface is a valuable feature, much more valuable than eliding one assembly instruction.