Search code examples
gccassemblymipsstrlenoff-by-one

How is this gcc-generated strlen() mips loop not off-by-one?


Here is the source code for a very basic strlen() implementation.

#include <stddef.h>
#include <stdint.h>

extern uintptr_t lx_syscall3(uintptr_t a, uintptr_t b, uintptr_t c, uintptr_t nr);

static void lx_sys_exit(uintptr_t code)
{
  lx_syscall3(code, 0, 0, 4001);
  while (1);
}

static size_t lx_strlen(char const* s)
{
  size_t len = 0;

  while (*(s++)) {
    len++;
  }

  return len;
}

int main() {
  lx_sys_exit(lx_strlen("HELO"));
  while (1);
}

Compiled together with a syscall.s file not relevant to this question, the generated GCC code for lx_strlen is inlined into main (at -Os):

004004fc <main>:
  4004fc: 3c1c000b  lui gp,0xb
  400500: 279c8154  addiu gp,gp,-32428
  400504: 0399e021  addu gp,gp,t9
  400508: 8f828034  lw v0,-32716(gp)
  40050c: 27bdffe0  addiu sp,sp,-32
  400510: 24424a64  addiu v0,v0,19044
  400514: afbc0010  sw gp,16(sp)
  400518: afbf001c  sw ra,28(sp)
  40051c: 00402825  move a1,v0
  400520: 00452023  subu a0,v0,a1

  # strlen loop block follows
  400524: 24420001  addiu v0,v0,1
  400528: 8043ffff  lb v1,-1(v0)
  40052c: 5460fffd  bnezl v1,400524 <main+0x28>
  400530: 00452023  subu a0,v0,a1

  400534: 8f998118  lw t9,-32488(gp)
  400538: 24070fa1  li a3,4001
  40053c: 00003025  move a2,zero
  400540: 04110093  bal 400790 <lx_syscall3>
  400544: 00002825  move a1,zero
  400548: 1000ffff  b 400548 <main+0x4c>
  40054c: 00000000  nop

When run with qemu-mipsel, the code correctly outputs exit status 4. So it seems to work OK, the problem is that I'm just not understanding how it can possibly work. Notice the offset -1(v0) at 400528. So the loop is always checking the preceding byte from the address stored in v0. Thus by the time that's zero, subtracting out the original address should yield 5, not 4. Any idea how it works?


Solution

  • The code is using the bnezl instruction which has a special handling of the delay slot instruction: it's only executed if the branch is taken. Hence, your code will always use the $a0 from the previous iteration because the subu a0,v0,a1 at 400530 is not executed for the final one that exits the loop. Note that at 400520 $a0 is zeroed for the case of a zero length string.