Inline assembly with "jmp 0f" or "b 0f" at the beginning

updated

Changed the 2nd line of assembly to the mnemonic actually being used (mflr) and added more info at the bottom.

I ran across some code (using gcc) resembling the following (paraphrased):

#define SOME_MACRO( someVar ) \
do {                          \
  __asm__ (                   \
    "    b 0f\n"              \
    "0:  mflr %0\n"           \
    : "=r"( someVar )         \
  );                          \
} while(0)

... where the b instruction (ppc) is a short jmp and mflr is getting the contents of the 'link register' -- which is similar to the program counter in some respects. I've seen this sort of thing for intel code as well (cf. the accepted answer in this question).

The branch acts as a no-op ... my question: what purpose does this serve?

I'm guessing it has something to do with branch prediction stuff, but so far I've only found people's code using this idiom while searching.

It looks like I was wrong on the branch prediction guess. mflr grabs the contents of the link register.

So, my question boils down to: why is the branch necessary.

Solution

The interesting bits of code like this tend to happen in somethingelse. Some known purposes of such code are:

runtime state retrieval; In x86, for example,
__asm__("call 0f\n0: pop %0\n" : "=r"(pc))
is a way to retrieve the program counter (IP register - this is hidden and not directly accessible, so the fact call pushes it to the stack is used to retrieve it).
Beware this isn't safe to use in leaf functions in 64bit mode due to the red zone - see Inline assembly that clobbers the red zone . The correct way to do it on x86_64 is
asm("lea 0f(%%rip), %0\n0:\n" : "=r"(pc))
which exploits the fact that PC-relative addressing is possible in 64bit mode.
instrumentation (debugging / runtime tracing), e.g. by putting tracing code / NOP slots in there that tracing utilities at runtime can modify to dynamically hook into the code. Solaris DTrace uses such techniques.
On ARM (and 64bit x86), the method is also used to embed constants within the code, for use with PC-relative loads.

Whether unconditional branches like this cause branch prediction miss penalties or other type of stalls is very CPU-dependent.